Meanwhile they are pushing AI transcription and note taking solutions hard.
Patients are guilted into allowing the doctors to use it. I have gotten pushback when asked to have it turned off.
The messaging is that it all stays local. In reality it’s not and when I last looked it was running on Azure OpenAI in Australia.
I spoke to a practice nurse a few days ago to discuss this.
She said she didn’t think patients would care if they knew the data would be shipped off site. She said people’s problems are not that confidential and their heath data is probably online anyway so who cares.
It's honestly such a big problem. One of my colleagues uses an AI scribe. I can't rely on any of his chart notes because the AI sometimes hallucinates (I've already informed him). It also tends to write a ridiculous amount of detail that are totally unnecessary and leave out important details such that I still need to comb through patient charts for (med rec, consults, etc). In the end it ends up creating more work for me. And if my colleague ever gets a college complaint I have no clue how he's gonna navigate any AI generated errors. I'm all for AI and it's great for things like copywriting, brainstorming and code generation. But from what I'm seeing, it's creating a lot more headache in the clinical setting.
If you're why doesn't this guy just check the AI scribe notes? Well, probably because with the amount of detail it writes, he'd be better off writing a quick soap note.
My (extensive) experience with LLM code generation is that it has the same issues you describe in your field. Hallucinations, over-engineering, misses important requirements/patterns.
But engineers have these same problems. The key is that the content creator (engineers for codegen, doctors for medicine) is still responsible for the output of the AI, as if they wrote it themselves. If they make a mistake with an AI (eg, include false data - hallucinations), they should be held accountable in the same way they would if they made a mistake without it.
Okay but since we know how humans actually behave, they will fully trust the indeterministic machine and give away their thinking. Sadly there is a large swath of humans that will act like this, maybe 20-30%.
Are you willing to put your life in the hands of these people fully using the machines to do everything?
Acting like that smart people aren't getting one shot'ed by these machines is very dangerous. Even worse is how quickly your skills actual degrade. If knew my doctor was using anything LLM related, I would switch doctors.
For all the whinging about bugs and errors around here the software industry in general (some niche sub-fields excepted) long ago decided 80% is good enough to ship and we will figure the rest out later. This entire site is based on startup culture which largely prided itself on MVP moonshots.
Plus plenty of places are perfectly fine with tech dept and the AI fire hose is effectively tech debt on steroids but while it creates it at scale it can also help in understanding it.
It is is own panacea in a way.
I think it is gonna be a while before the industry figures out how to handle this better so might as well just ride the wave and not worry too much about it in software.
Still software is not medicine even if software is required in basically every industry now. It should more conservative and wait till things settle down before jumping in.
It feels very much like AI is creating AI lock-in (if not AI _vendor_ lock-in) by creating so much detailed information that it's futile to consume it without AI tools.
I was updating some gitlab pipelines and some simple testing scripts and it created 3 separate 300+ line README type metadata files (I think even the QUCIKSTART.md was 300 lines).
I spec'd up an implementation of this that uses a hardware button with colors that is in reach of either party. The customer went with a different vendor based on price/"complexity"/training.
Very little protections. The entire medical records of a significant percentage of the NZ population were stolen recently and put up for sale online. Zero consequences for the medical practices who adopted the hacked software.
Many AI companies, including Azure with their OpenAI hosting, are more than willing to sign privacy agreements that allow processing sensitive medical data with their models.
The devil is in the details. For example, OAI does not have regional processing for AU [0] and their ZDR does not cover files[1]. Anthropic's ZDR [2] also does not cover files, so you really need to be careful, as a patient/consumer, to ensure that your health, or other sensitive data, that is being processed by SaaS frontier models is not contained in files. Which is asking a a lot of the medical provider to know how their systems work, they won't, which is why I will never opt in.
There is no way to upload files as a part of context with Azure deployments, you have to use the OAI API [0], and without having an architecture diagram of the solution, I am not going to trust it based off of the known native limitations with Azure's OAI implementation.
The New Zealand Chief Digital Officer allowed Australian cloud providers to be used as there weren't suitable NZ data centers and this was many years ago.
Health NZ adopted Snowflake. It was about costs/fancy tech. We have always had data centres. Nobody *needs* snowflake. They could have used Apache Spark.
The union rep gets it - people improvise when you cut their tools and then threaten discipline for improvising.
That memo is how you make staff hide things instead of asking for help.
The scarier part though is that LLM-written clinical notes probably look fine. That's the whole problem. I built a system where one AI was scoring another AI's work, and it kept giving high marks because the output read well. I had to make the scorer blind to the original coaching text before it started catching real issues. Now imagine that "reads well, isn't right" failure mode in clinical documentation.
Nobody's re-reading the phrasing until a patient outcome goes wrong.
Physicians need to have it pounded into them that every hallucination is downstream harm. AI has no place in medicine. If they insist on it, then all transcripts must be stored with the raw audio. Which should be accessible side by side, with lines of transcript time coded. It's the only way to actually use these safely, while guarding against hallucinations.
> Physicians need to have it pounded into them that every hallucination is downstream harm.
I think any person using 'AI' knows it makes mistakes. In a medical note, there are often errors at present. A consumer of a medical note has to decide what makes sense and what to ignore, and AI isn't meaningfully changing that. If something matters, it's asked again in follow up.
Raw audio is a cool idea! I've seen a similar approach in other domains, "keep the source of truth accessible so you can verify the AI output against it".
I wouldn't go as far as "no place in medicine" though. The Heidi scribe tool mentioned in the article is a good example, because in the end it's the doctor who reviews and signs off.
IMO the problem is AI doing the work with no human verification step, but I can 100% agree I don't want to have vibe-doctor for my next surgery/consult :D
I know at least one GP that has stopped using Heidi Health for transcription. He (and as I've noticed with transcriptions from my medical professionals) has noticed many errors, far too many to be comfortable. Things might improve, but not yet.
This is where I'm at as a GP. Every few months I give Heidi another try, but I haven't noticed any real improvement over the last two years. It spends lots of words on trivial nonsense and misses clinically significant points and sometimes entire issues. It takes far more time to review and fix the notes than it saves in typing. Presumably it will be good enough one day, but it's not there yet.
I have seen the evolution of these tools and I think they are going to push a fundamental change to medical care. Notes have been getting more and more abused, at least in the US. Big health systems want them for a lot of reasons that have nothing to do with helping a practitioner improve the care of their patient. They want to capture every billable moment of that encounter and potentially prep things like labs, appointments, clinical trial screening, pre-auths, etc. Some of this is good for the patient but a lot isn't. Also, the reality is that many practitioners spend as much, or more time, on the note than on the patient. That clearly isn't to their benefit. There is a reason they sit there and type constantly while talking to you and that doesn't stop when you leave the room. The demands on them to document everything so that all the accounting can happen are actually harming healthcare.
I think there is a chance that these systems will lead to a change where the note isn't the fundamental record of the encounter. Instead different artifacts are created specifically for each entity that needs it. Billing gets their view, and scheduling gets theirs, and, etc etc... It will, hopefully, give the practitioners a chance to get back to focusing on the patient and not ensuring their note quality captured one more billable code. Of course the negative is also likely to happen here too. As practitioners spend less time on the note they will likely not get that back in time with individual patients, but instead on seeing more patients. It will also likely lead to higher bills as the health systems do start squeezing more out of every encounter. There is no perfect here when profit is the driving motivator but with this much change happening I can only hope that it causes the industry as a whole to shake up enough to maybe find a new better optimum to land in.
Yeah, no privacy or security there. There are some tools explicitly designed at helping healthcare providers produce better notes faster, and a couple of them are AMAZING. I'm an AI-half-empty guy, I'm keenly aware of its shortcomings and deploy it thoughtfully, and even with my skepticism there are a couple of tools that are just plain great. I think using LLMs to create overviews and summaries is a great use of the tech.
AI doesn't forget and soon all of new zealanders will have their health histories internalised by AI so it can individually calculate insurance premiums without knowing why....
This is a ridiculous sentence. Of course inference-only AI forgets. You can literally just program it that way.
In fact, it's human transcribers who chose whether to forget the details of a case or whether to share the details of an especially funny patient with their buddies at the bar.
Heidi is frustratingly consistent at hallucinating stuff. I've seen it in almost all of the dozen or so summaries I've had from medical people recently (surgeon, physio, consultant). A GP I know tried for a month and then was like 'it's not worth the risk exposure to me or my patients'.
Patients are guilted into allowing the doctors to use it. I have gotten pushback when asked to have it turned off.
The messaging is that it all stays local. In reality it’s not and when I last looked it was running on Azure OpenAI in Australia.
I spoke to a practice nurse a few days ago to discuss this.
She said she didn’t think patients would care if they knew the data would be shipped off site. She said people’s problems are not that confidential and their heath data is probably online anyway so who cares.
If you're why doesn't this guy just check the AI scribe notes? Well, probably because with the amount of detail it writes, he'd be better off writing a quick soap note.
But engineers have these same problems. The key is that the content creator (engineers for codegen, doctors for medicine) is still responsible for the output of the AI, as if they wrote it themselves. If they make a mistake with an AI (eg, include false data - hallucinations), they should be held accountable in the same way they would if they made a mistake without it.
Are you willing to put your life in the hands of these people fully using the machines to do everything?
Acting like that smart people aren't getting one shot'ed by these machines is very dangerous. Even worse is how quickly your skills actual degrade. If knew my doctor was using anything LLM related, I would switch doctors.
It's funny how the assumption is always that LLMs are very useful in an industry other than your own.
For all the whinging about bugs and errors around here the software industry in general (some niche sub-fields excepted) long ago decided 80% is good enough to ship and we will figure the rest out later. This entire site is based on startup culture which largely prided itself on MVP moonshots.
Plus plenty of places are perfectly fine with tech dept and the AI fire hose is effectively tech debt on steroids but while it creates it at scale it can also help in understanding it.
It is is own panacea in a way.
I think it is gonna be a while before the industry figures out how to handle this better so might as well just ride the wave and not worry too much about it in software.
Still software is not medicine even if software is required in basically every industry now. It should more conservative and wait till things settle down before jumping in.
I was updating some gitlab pipelines and some simple testing scripts and it created 3 separate 300+ line README type metadata files (I think even the QUCIKSTART.md was 300 lines).
That's funny. I would have said the same thing about your field prior to reading your comment.
[0] https://developers.openai.com/api/docs/guides/your-data#whic...
[1] https://developers.openai.com/api/docs/guides/your-data#stor...
[2] https://platform.claude.com/docs/en/build-with-claude/zero-d...
The models are licensed to Microsoft, and you pay them for the inference.
[0] https://github.com/openai/openai-python/issues/2300
That memo is how you make staff hide things instead of asking for help.
The scarier part though is that LLM-written clinical notes probably look fine. That's the whole problem. I built a system where one AI was scoring another AI's work, and it kept giving high marks because the output read well. I had to make the scorer blind to the original coaching text before it started catching real issues. Now imagine that "reads well, isn't right" failure mode in clinical documentation.
Nobody's re-reading the phrasing until a patient outcome goes wrong.
I think any person using 'AI' knows it makes mistakes. In a medical note, there are often errors at present. A consumer of a medical note has to decide what makes sense and what to ignore, and AI isn't meaningfully changing that. If something matters, it's asked again in follow up.
I wouldn't go as far as "no place in medicine" though. The Heidi scribe tool mentioned in the article is a good example, because in the end it's the doctor who reviews and signs off.
IMO the problem is AI doing the work with no human verification step, but I can 100% agree I don't want to have vibe-doctor for my next surgery/consult :D
https://en.wiktionary.org/wiki/Gell-Mann_Amnesia_effect
I think there is a chance that these systems will lead to a change where the note isn't the fundamental record of the encounter. Instead different artifacts are created specifically for each entity that needs it. Billing gets their view, and scheduling gets theirs, and, etc etc... It will, hopefully, give the practitioners a chance to get back to focusing on the patient and not ensuring their note quality captured one more billable code. Of course the negative is also likely to happen here too. As practitioners spend less time on the note they will likely not get that back in time with individual patients, but instead on seeing more patients. It will also likely lead to higher bills as the health systems do start squeezing more out of every encounter. There is no perfect here when profit is the driving motivator but with this much change happening I can only hope that it causes the industry as a whole to shake up enough to maybe find a new better optimum to land in.
We had to correct them at the end of the consultation.
In fact, it's human transcribers who chose whether to forget the details of a case or whether to share the details of an especially funny patient with their buddies at the bar.
This is just about not using free/public AI tools.
Heidi is frustratingly consistent at hallucinating stuff. I've seen it in almost all of the dozen or so summaries I've had from medical people recently (surgeon, physio, consultant). A GP I know tried for a month and then was like 'it's not worth the risk exposure to me or my patients'.