A new study led by researchers at Harvard Medical School has found that artificial intelligence models can outperform human doctors in certain emergency room diagnostic tasks. Published in the journal Science, the research showed that OpenAI’s “o1” reasoning model achieved a 67% accuracy rate in identifying exact or near-correct diagnoses during initial emergency room triage, compared to 50–55% accuracy among physicians reviewing the same patient records.
The study focused on 76 patients admitted to the emergency department at Beth Israel Deaconess Medical Center in Boston. Researchers provided both AI systems and doctors with identical electronic medical records containing limited information such as vital signs, demographics, and nurse notes. Independent physicians later reviewed all diagnoses without knowing whether they came from humans or AI. As additional patient information became available, the AI model’s diagnostic accuracy increased to 82%, slightly outperforming experienced doctors.
The findings suggest that advanced large language models may soon become valuable “second-opinion” tools in healthcare, particularly in high-pressure environments where rapid decision-making is critical. In another part of the study, AI systems scored 89% in creating treatment and management plans for complex medical cases, while doctors using traditional resources such as search engines scored 34%. Researchers believe this demonstrates the growing strength of AI in synthesizing large volumes of medical data and supporting clinical reasoning.
Despite the impressive results, experts cautioned that AI is not ready to replace physicians. The systems were tested only on text-based medical records and could not evaluate visual symptoms, patient distress, or emotional communication. Concerns also remain around hallucinations, bias, accountability, and the risk of doctors relying too heavily on AI-generated answers. Researchers emphasized that the future of healthcare will likely involve a collaborative model where doctors, patients, and AI systems work together rather than AI independently replacing human medical judgment.