AI Exhibits Human-Like Cognitive Errors in Medical Reasoning

A recent study published on PsyPost reveals that advanced artificial intelligence (AI) models used in medical-decision contexts can replicate the same kinds of cognitive biases that human clinicians display. The research found that large-language models tasked with clinical reasoning were susceptible to framing effects, primacy effects, hindsight bias and other systematic reasoning flaws — in some cases more so than human doctors.

In their methodology, the authors Jonathan Wang and Donald A. Redelmeier created 10 clinical vignettes, each in two versions: one phrased neutrally and one designed to trigger a known bias. They then prompted two leading models — GPT‑4 and Gemini‑1.0‑Pro — with “synthetic clinician” personas (500 in total) and analyzed their open-ended responses. For example, when a lung-cancer surgery was framed in terms of survival versus mortality statistics, GPT-4’s recommendation rate shifted dramatically: 75 % in the survival frame versus just 12 % in the mortality frame — a 63-percentage-point difference, significantly larger than observed in human studies (34 points).

One notable exception emerged in the case of “base-rate neglect” (ignoring the overall prevalence of a condition). Here, GPT-4 performed near-perfectly (94 % vs 93 % in high- and low-prevalence scenarios) — a task humans often struggle with. That said, the overall message is cautionary: deploying AI in high-stakes medical contexts does not guarantee more rational decisions, simply because the model is fast or well-trained. The biases may be baked into the training data and the statistical patterns the model has learned. The authors argue that clinicians must remain vigilant and treat AI advice as inputs to human judgement — not substitutes.

These findings carry strong implications for healthcare-AI deployment, particularly in diagnostics, treatment planning and decision support systems. They suggest that verifying AI models solely on accuracy metrics (e.g., correct diagnosis rate) misses the deeper question of how reasoning unfolds. If models mirror human cognitive flaws, then without oversight they may replicate or even amplify errors at scale. The study also underscores that different models exhibit different bias profiles (e.g., Gemini showed different patterns than GPT-4), which means model-specific auditing is needed.

AI Exhibits Human-Like Cognitive Errors in Medical Reasoning

Divya Maheshwari

TOOLHUNT

AI Exhibits Human-Like Cognitive Errors in Medical Reasoning

Divya Maheshwari

How IT Leaders Can Build Successful AI Strategies — The VC View

AI & Sports Integrity: Fighting Corruption in the Digital Era

India’s AI Future Hinges on Home-Grown Models

AI’s Growing Influence on CEOs and the Future of Work

Democrats Fight to Preserve States’ Rights to Regulate AI

TOOLHUNT