This study investigates whether large language models (LLMs) like GPT‑4o and retrieval‑enhanced AI systems can safely and effectively provide personalized lifestyle advice for patients with atrial fibrillation (AF) — a common heart rhythm disorder. Because doctors often have limited time during appointments, patients may not receive detailed lifestyle guidance that can influence outcomes such as exercise habits, diet, and other behavior changes. The researchers compared AI‑generated responses to those of experienced electrophysiologists across multiple dimensions of clinical appropriateness, accuracy, empathy, and helpfulness.
The research involved 66 real questions from 16 AF patients covering areas like physical activity, nutrition, and general lifestyle. Three types of large language models were tested: GPT‑4o, a model using a curated Q&A database, and a model that retrieves evidence from medical literature. Each AI response was independently evaluated by five seasoned electrophysiologists using a structured set of criteria that included medical consensus, accuracy, relevance, and empathy.
Results showed that GPT‑4o performed at a level comparable to human clinicians in terms of scientific consensus and had a lower error rate than the other AI variants. GPT‑4o also scored higher on specialized content, empathy, and perceived helpfulness — suggesting that some modern LLMs can generate medically nuanced and patient‑friendly advice. The database-based and literature-retrieval models performed similarly to physicians on error rates and general helpfulness, with each model showing distinct strengths depending on how they accessed knowledge.
The authors concluded that integrating complementary strengths from different LLM approaches could pave the way for safer, more reliable medical AI systems that assist clinicians and patients alike. However, they emphasize that AI shouldn’t replace human caregivers but rather augment clinical communication and health education — especially in areas where time and resources are limited. Responsible deployment would require continued evaluation, transparency, and safeguards tailored to real-world medical use.