The rapid evolution of artificial intelligence (AI) is reshaping industries across the globe, yet it often overlooks the linguistic diversity that defines many regions. In India, a nation rich with over 120 recognized languages and countless dialects, the need for AI systems that cater to its multilingual fabric is urgent. The AI models predominantly cater to English-speaking users, leaving a significant portion of the Indian population underserved. This is where BharatGen steps in, aiming to revolutionize the AI landscape to better serve India's diverse linguistic needs.
BharatGen, an initiative funded by the Indian Department of Science and Technology, is spearheading efforts to create AI models that resonate with India's linguistic diversity. Initiated last year at IIT Bombay, BharatGen has already developed foundational models for 14 of the most prevalent Indic languages. Their ambition extends beyond these to encompass all 22 scheduled languages of India, with aspirations to support even more of the country's linguistic diversity.
BharatGen's mission is clear: to ensure that the AI revolution is inclusive, catering to the cultural and linguistic nuances of India's vast population. This is an essential step towards digital equity, allowing broader participation in the digital economy and society.
In a significant leap towards achieving its goals, BharatGen has partnered with IBM.
This partnership was nurtured through an initial meeting at an AI Alliance event at IBM Research India, where early collaborations demonstrated the potential of IBM’s InstructLab tool for fine-tuning models for Indic languages. This laid the groundwork for the current extensive collaboration.
The collaboration between BharatGen and IBM will initially focus on several key sectors: education, agriculture, banking, healthcare, and citizen services. These are areas where AI can significantly enhance service delivery and user experience. By integrating with IBM's Granite models and utilizing IBM watsonx and Red Hat OpenShift AI, the joint effort aims to develop use case templates that address the unique challenges and opportunities within these industries.
Furthermore, the partnership is committed to building open-source data and AI pipelines that are robust and reliable for Indic languages, ensuring that the AI systems developed are grounded in the specific governance frameworks necessary for such a diverse and complex environment.
While BharatGen has made significant strides by developing models for 14 languages, the journey is far from over. India’s linguistic landscape includes not only the 22 scheduled languages but also a multitude of dialects spoken by millions. BharatGen's mandate is to develop AI tools that serve the entire nation, thus promoting broader digital participation and equity.
The initiative seeks to create benchmarks specifically tailored for India and its languages, ensuring that the AI models developed meet the nation's varied commercial and cultural needs. Such efforts are crucial for fostering an inclusive digital environment where everyone can participate and benefit.
BharatGen's collaboration with IBM represents a pivotal moment in India's AI journey. By combining local expertise with global technological prowess, this partnership is poised to transform how AI interacts with India's diverse population. The ultimate goal is to build sovereign AI models that reflect the linguistic richness, cultural nuances, and diverse needs of India's people.
As BharatGen continues to expand its reach and capabilities, the impact of this initiative will be felt across India’s socio-economic landscape. The journey towards a linguistically inclusive AI ecosystem is underway, promising to empower millions by bridging the digital divide and ensuring that the benefits of AI are distributed equitably across the nation.