MIT researchers have made a groundbreaking development in teaching AI to communicate sounds like humans do. Inspired by the mechanics of the human vocal tract, they've created an AI model that can produce and understand vocal imitations of everyday sounds.
This model can effectively take many sounds from the world and generate a human-like imitation of them, including noises like leaves rustling, a snake's hiss, and an approaching ambulance siren. It can also be run in reverse to guess real-world sounds from human vocal imitations.
The researchers achieved this by building a model of the human vocal tract that simulates how vibrations from the voice box are shaped by the throat, tongue, and lips. They then used a cognitively-inspired AI algorithm to control this vocal tract model and make it produce imitations.
The potential applications of this technology are vast, including more intuitive "imitation-based" interfaces for sound designers, more human-like AI characters in virtual reality, and even methods to help students learn new languages.