Researchers from Saarland University and Max Planck Institute for Software Systems (MPISS) report that humans and large-language models (LLMs) show remarkably similar patterns of confusion when confronted with tricky or misleading computer code. Using brain-activity measurements (EEG) and human eye-tracking data alongside AI “uncertainty” scores (perplexity), the study found that code segments which cause spikes in human cognitive load also trigger high uncertainty in AI.
In the experiment, human developers read “clean” as well as “confusing” code snippets — code designed with subtle syntactic quirks called “atoms of confusion.” At the same time, LLMs processed the same code. Where humans exhibited increased brain-wave signals associated with surprise or difficulty (notably a rise in “late frontal positivity”), the AI models showed strong uncertainty. The alignment was statistically significant — showing that both human and machine “brains” get tripped up by the same code pitfalls.
Leveraging this insight, the researchers then developed a data-driven method to automatically detect potentially confusing or error-prone parts of code. Their algorithm flagged more than 60% of known “confusing patterns,” and also identified over 150 previously unrecognized code patterns that provoked similar brain-AI confusion signals. This opens the door to new tools that can warn developers (or AI assistants) in advance about risky code segments — potentially preventing bugs before they happen.
This finding is significant for the future of AI-assisted software development. It suggests that while AI assistants may help us code faster, they are subject to the same “cognitive blinders” as humans when it comes to tricky code. Effective human–AI collaboration may therefore require new layers of code review, awareness of “confusion zones,” and smarter tooling — rather than blindly trusting AI output.