Medical AI gets dangerous when the story becomes too simple.
The simple version says: model sees symptoms, model gives answer, doctor gets replaced, everyone argues on the internet.
The more useful version is quieter.
OpenAI published two health-related updates on June 18. One covered improved health intelligence in ChatGPT through GPT-5.5 Instant and physician-led evaluation. The other described an NEJM AI study where researchers from Boston Children’s Hospital, Harvard, and OpenAI used an OpenAI reasoning model to revisit 376 previously unsolved rare-disease cases.
After expert review, additional testing, and clinical confirmation, physicians established 18 diagnoses. That is the part to pay attention to.
Old cases are not dead cases
Rare disease work has a brutal time problem.
A patient can be sequenced, reviewed, and still remain undiagnosed because the available evidence is incomplete. Then the world changes. New papers appear, gene-disease links improve, variant classifications shift, and a case that was unsolvable five years ago may suddenly contain a path forward.
The hard part is scaling that reanalysis.
No hospital has infinite expert time. No clinician can continuously reread every old case against every new piece of literature. A reasoning model can help by searching the haystack again, connecting clinical details to genomic clues, and proposing evidence-linked leads for experts to test.
That is not autonomous medicine. That is a machine making the expert review loop more reachable.
The workflow matters more than the headline
The number, 18 diagnoses out of 376 cases, is important. But the workflow is more important than the number.
These were de-identified cases that had already been through specialist review. The model surfaced candidate explanations. Humans reviewed the evidence. Additional testing and clinical confirmation still mattered.
That is the sane shape.
Medical AI should not be judged only by whether it can sound confident in a chat window. It should be judged by whether it can improve a real clinical workflow without breaking accountability.
Can it show the evidence? Can it explain why a lead is biologically coherent? Can a specialist reject it quickly? Can the system preserve provenance? Can it make periodic reanalysis cheaper without turning patients into beta testers?
Those are the adult questions.
Better health answers need better boundaries
OpenAI’s ChatGPT health update sits next to this study for a reason.
People already ask general-purpose models medical questions. Better responses matter. Physician-led evaluation matters. But consumer health advice and clinical diagnosis are different worlds, and blurring them is where trust goes to die.
The rare-disease study is more interesting because it keeps the human system intact.
AI helps search, reason, summarize, and propose. Experts diagnose.
That division is not conservative. It is how the technology earns its way into serious medicine without pretending the hard parts disappeared.
Sources: OpenAI rare-disease study summary, OpenAI health intelligence update