Artificial intelligence models are increasingly being used to analyze and generate scientific content, but a concerning trend has emerged: these models are relying on retracted scientific papers to provide answers. Researchers tested several AI tools, including Elicit, Ai2 ScholarQA, Perplexity, and Consensus, using questions based on 21 retracted papers. The results showed that these tools referenced retracted papers without noting the retractions, raising significant concerns about the reliability and accuracy of AI-generated content.
The study found that Elicit referenced 5 retracted papers, Ai2 ScholarQA referenced 17, Perplexity referenced 11, and Consensus referenced 18, all without noting the retractions. This lack of accuracy is particularly concerning in fields like medicine, where retracted papers can have serious implications for public health. Some companies, like Consensus, are working to address this issue by incorporating retraction data from sources like Retraction Watch, a database that manually curates and maintains records of retracted papers.
However, relying on retraction databases may not be enough. Creating a comprehensive database of retractions requires significant resources and manual effort. Moreover, publishers don't share a uniform approach to retraction notices, making it difficult for AI models to detect retractions. Experts advocate for making more context available for models to use when creating a response, such as publishing information that already exists, like peer reviews and critiques.
The use of retracted scientific papers by AI models highlights the need for more robust and reliable approaches to scientific content generation and analysis. By prioritizing accuracy, transparency, and accountability, we can work towards developing more trustworthy AI systems.