AI Exhibits Human-Like Cognitive Errors in Medical Reasoning

AI Exhibits Human-Like Cognitive Errors in Medical Reasoning

A recent study published on PsyPost reveals that advanced artificial intelligence (AI) models used in medical-decision contexts can replicate the same kinds of cognitive biases that human clinicians display. The research found that large-language models tasked with clinical reasoning were susceptible to framing effects, primacy effects, hindsight bias and other systematic reasoning flaws — in some cases more so than human doctors.

In their methodology, the authors Jonathan Wang and Donald A. Redelmeier created 10 clinical vignettes, each in two versions: one phrased neutrally and one designed to trigger a known bias. They then prompted two leading models — GPT‑4 and Gemini‑1.0‑Pro — with “synthetic clinician” personas (500 in total) and analyzed their open-ended responses. For example, when a lung-cancer surgery was framed in terms of survival versus mortality statistics, GPT-4’s recommendation rate shifted dramatically: 75 % in the survival frame versus just 12 % in the mortality frame — a 63-percentage-point difference, significantly larger than observed in human studies (34 points).

One notable exception emerged in the case of “base-rate neglect” (ignoring the overall prevalence of a condition). Here, GPT-4 performed near-perfectly (94 % vs 93 % in high- and low-prevalence scenarios) — a task humans often struggle with. That said, the overall message is cautionary: deploying AI in high-stakes medical contexts does not guarantee more rational decisions, simply because the model is fast or well-trained. The biases may be baked into the training data and the statistical patterns the model has learned. The authors argue that clinicians must remain vigilant and treat AI advice as inputs to human judgement — not substitutes.

These findings carry strong implications for healthcare-AI deployment, particularly in diagnostics, treatment planning and decision support systems. They suggest that verifying AI models solely on accuracy metrics (e.g., correct diagnosis rate) misses the deeper question of how reasoning unfolds. If models mirror human cognitive flaws, then without oversight they may replicate or even amplify errors at scale. The study also underscores that different models exhibit different bias profiles (e.g., Gemini showed different patterns than GPT-4), which means model-specific auditing is needed.

About the author

TOOLHUNT

Effortlessly find the right tools for the job.

TOOLHUNT

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to TOOLHUNT.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.