Elon Musk agrees that we've exhausted real-world data for training AI models. He believes that synthetic data is the future, as it can be generated by AI models themselves to supplement the limited real-world data.
Musk's comments echo those of Ilya Sutskever, former chief scientist at OpenAI, who discussed the concept of "peak data" at the NeurIPS machine learning conference. This refers to the point where the amount of available real-world data is no longer sufficient to train more advanced AI models.
Several tech companies, including Microsoft, Meta, OpenAI, and Anthropic, are already using synthetic data to train their AI models. In fact, Gartner estimates that by 2024, 60% of the data used in AI and analytics projects will be synthetically generated.
While synthetic data offers several benefits, including cost savings and increased flexibility, it also poses some challenges. For instance, research has shown that relying on synthetic data can lead to "model collapse," where AI systems become less creative and more biased over time.