OpenAI's latest research paper sheds light on why ChatGPT and other large language models struggle with "hallucinations" - providing false information with confidence. The issue stems from how these models are evaluated and ranked, rather than just their design. Current benchmarks reward models for answering all questions, even if they're wrong, and penalize them for expressing uncertainty.
This leads to models guessing instead of saying "I don't know", resulting in hallucinations. Even with perfect training data, the problem persists due to the mathematical inevitability of errors in language models. OpenAI suggests having AI models consider their confidence in an answer before providing it, and scoring them based on that confidence.
For example, a model could be prompted to "Answer only if you're more than 75% confident, since mistakes are penalized 3 points while correct answers receive 1 point." However, this approach would require significantly more computation, making it costly for consumer applications where instant responses are expected.
If ChatGPT started saying "I don't know" to even 30% of queries, users accustomed to receiving confident answers might abandon the system rapidly. The business incentives driving consumer AI development are misaligned with reducing hallucinations, making it a persistent issue.
OpenAI's research highlights the need for new evaluation metrics that reward genuine knowledge and penalize confident ignorance. By adopting these new scoring mechanisms, developers can cultivate more trustworthy AI models that prioritize verifiable truthfulness and expressions of doubt.