The African Next Voices project is revolutionizing AI accessibility in Africa by creating a massive dataset of African languages. With a $2.2 million grant from the Gates Foundation and funding from Meta, the project has collected 9,000 hours of speech across 18 languages from Kenya, Nigeria, and South Africa, making it the largest known dataset of African languages for AI development. The languages included are Hausa, Yoruba, isiZulu, Kikuyu, and others, aiming to accurately reflect authentic language use within these communities.
The dataset will be useful for various applications, including captioning local-language media, voice assistants for agriculture and health, call-center support, and education tools. The data will also be archived for cultural preservation, ensuring that African languages are represented in digital spaces. The project involves collaboration with African universities, organizations, and initiatives like Masakhane Research Foundation, Lelapa AI, and Mozilla Common Voice. By prioritizing authentic language use and community involvement, African Next Voices aims to create AI models that truly reflect African languages and cultures.
The project addresses the significant underrepresentation of African languages on the internet and in AI tools, which are predominantly developed and trained in English, Chinese, and European languages. Language is crucial for AI as it's how we interact, ask for help, and convey meaning. The African Next Voices project pairs ethically collected, high-quality speech with models, enabling people to speak naturally and access AI in their native languages. The long-term vision is to provide choices, allowing farmers, teachers, or local businesses to use AI in isiZulu, Hausa, or Kikuyu, not just English or French.
The next steps involve expanding to more languages, developing machine translation and grammar checkers, and building smaller, energy-efficient language models. Ensuring sustainability is crucial, providing students, researchers, and innovators with continued access to computing resources, training materials, and licensing frameworks. By making African languages visible and usable in AI, the project sets new standards for inclusive, responsible AI worldwide, benefiting communities and promoting cultural preservation.