MIT researchers are examining how advanced artificial intelligence systems trained on electronic health records (EHRs) might inadvertently memorize and potentially disclose sensitive patient information. While machine learning models are meant to generalize knowledge across many records to make accurate predictions, there is a concern that they can sometimes memorize individual patient data instead, which could be triggered by specific prompts. This memorization risk is especially worrisome in health care because it could undermine patient confidentiality, a cornerstone of medical practice.
The research, presented at a major machine learning conference, focuses on designing rigorous tests to assess whether foundation models trained on de-identified EHRs are exposing identifiable details. In these situations, a model should draw from broad patterns in data, not recall exact entries from single patients. But if memorization occurs, an attacker with partial information might coax the model into revealing more sensitive data, potentially violating privacy protections even when datasets are anonymized.
To better understand these vulnerabilities, the MIT team developed structured tests to simulate various “attack” scenarios and measure the likelihood and severity of information leakage. Their approach distinguishes between benign disclosures—such as generic demographics—and harmful disclosures, like specific medical diagnoses. Importantly, the tests gauge how much prior knowledge an adversary would need to make memorization exploitable in practice, emphasizing practical risk assessment rather than theoretical threat alone.
The findings highlight that patients with rare or unique medical conditions are most at risk because they stand out more easily in datasets. The researchers plan to expand this work by involving clinicians, privacy experts, and legal scholars to develop comprehensive evaluation frameworks that help ensure clinical AI systems protect patient data effectively before they are widely deployed.