Harvard study finds AI delivered more accurate ER diagnoses than doctors

A Harvard study found AI systems provided more accurate emergency room diagnoses than two human doctors, highlighting AI’s growing role in healthcare.

Shivangi Yadav

May 7, 2026 - 21:10

Harvard study finds AI delivered more accurate ER diagnoses than doctors

A newly released study explores how large language models perform across different medical situations, including real emergency room cases, where at least one model appeared to surpass human doctors in diagnostic accuracy.

The findings were published this week in Science and come from a research group led by physicians and computer scientists at Harvard Medical School and Beth Israel Deaconess Medical Centre. According to the researchers, multiple experiments were conducted to compare OpenAI’s models with human physicians.

One part of the study examined 76 patients who arrived at the Beth Israel emergency room. Researchers compared diagnoses made by two internal medicine attending physicians with those generated by OpenAI’s o1 and 4o models. These results were then evaluated by two additional attending physicians who were unaware of whether the diagnoses originated from humans or AI systems.

“At each diagnostic touchpoint, o1 either performed nominally better than or on par with the two attending physicians and 4o,” the study noted, adding that the gap was “especially pronounced at the first diagnostic touchpoint (initial ER triage), where there is the least information available about the patient and the most urgency to make the correct decision.”

In an official release from Harvard Medical School, researchers emphasised that no preprocessing was applied to the data — meaning the AI systems received the same electronic medical record information available at the time each diagnosis was made.

Using that data, the O1 model delivered “the exact or very close diagnosis” in 67% of triage cases. In comparison, one physician achieved that level of accuracy 55% of the time, while the second reached 50%.

“We tested the AI model against virtually every benchmark, and it eclipsed both prior models and our physician baselines,” said Arjun Manrai, who leads an AI lab at Harvard Medical School and is among the study’s principal authors.

Despite the results, the researchers stressed that the study does not suggest AI is ready to handle life-or-death decisions in emergency settings independently. Instead, they said the findings highlight an “urgent need for prospective trials to evaluate these technologies in real-world patient care settings.”

They also pointed out that the study focused exclusively on text-based inputs, noting that “existing studies suggest that current foundation models are more limited in reasoning over nontext inputs.”

Adam Rodman, a doctor at Beth Israel and co-author of the study, told The Guardian that there is currently “no formal framework right now for accountability” for AI-driven diagnoses. He added that patients still expect human guidance for critical medical decisions.

Meanwhile, emergency physician Kristen Panthagani described the findings as “an interesting AI study that has led to some very overhyped headlines.” She noted that the comparison involved internal medicine physicians rather than emergency room specialists.

“If we’re going to compare AI tools to physicians’ clinical ability, we should start by comparing them to physicians who actually practice that speciality,” Panthagani said. “I would not be surprised if an LLM could beat a dermatologist at a neurosurgery board exam, [but] that’s not a particularly helpful thing to know.”

She further explained that in emergency care, identifying life-threatening conditions takes priority over pinpointing a final diagnosis. “As an ER doctor seeing a patient for a first time, my primary goal is not to guess your ultimate diagnosis. My primary goal is to determine if you have a condition that could kill you,” she said.

Tags:

What's Your Reaction?

Like 0

Dislike 0

Love 0

Funny 0

Angry 0

Sad 0

Wow 0

Shivangi Yadav Shivangi Yadav reports on startups, technology policy, and other significant technology-focused developments in India for TechAmerica.Ai. She previously worked as a research intern at ORF.