Ganesh Ramakrishnan's BharatGen Vision: Inclusive, diverse, self-reliant

As the principal investigator of BharatGen—a government-backed effort, led by a consortium of top Indian institutions, to develop multilingual and multimodal generative AI models for India—Ramakrishnan champions accessibility and inclusion in AI

  • Published:
  • 11/06/2025 10:47 AM

Ganesh Ramakrishnan, Principal investigator, BharatGen Image: Bajirao Pawar for Forbes India

In a world racing to build ever-larger AI models, Ganesh Ramakrishnan is forging a different path—one focussed on inclusion, linguistic diversity and national self-reliance. From the corridors of IIT-Bombay to the core of India’s AI mission, he has driven some of the country’s most ambitious language technology initiatives.

As the principal investigator of BharatGen—a government-backed effort, led by a consortium of top Indian institutions, to develop multilingual and multimodal generative AI models for India—Ramakrishnan is championing accessibility and inclusion in AI. With a deep background in machine learning and natural language processing, he has spent over a decade building data-efficient and compute-efficient technologies that bridge linguistic gaps and democratise AI access. “Our goal is to create foundational AI models that are inclusive, data-efficient, and representative of India’s rich linguistic, cultural and societal diversity,” says Ramakrishnan.

Ramakrishnan and his team are tackling the unique challenges of Indian languages, including data scarcity, multilingual intricacies and computational limitations. “Harnessing the similarities across morphologically rich, code-switched, and highly diverse Indic languages presents significant hurdles in tokenisation and representation. However, through our collaborative consortium efforts, we have made substantial strides in overcoming these obstacles,” he explains.  

At the heart of BharatGen’s mission is the goal of supporting Indian languages with limited digital presence. “One of the key enablers of this initiative is Bharat Data Sagar, a system designed to collect, organise, and curate text, speech and visual data from a diverse range of sources,” says Ramakrishnan. The team uses advanced techniques to make even small datasets useful, ensuring broad linguistic representation.

This work builds on earlier efforts like the Udaan Project, which focussed on machine translation for Indian languages. “It laid the groundwork for data curation pipelines in building OCR (Optical Character Recognition), ASR (Automatic Speech Recognition), and translation tools that now integrate into Bharat Data Sagar,” he notes. Udaan’s insights into semantic equivalence across Indian languages have informed BharatGen’s multilingual model training strategies, while its open-source ethos has inspired BharatGen’s commitment to democratised access.

View the full list here: 

Ramakrishnan describes BharatGen’s partnership with government bodies like DST and Ministry of Electronics and IT (MeiTY) as “deeply collaborative and forward-looking”. The Department of Science and Technology (DST), through its NM-ICPS programme, provided `235 crore in seed funding, enabling the development of India’s first foundational bilingual LLMs with 2 to 3 billion parameters.

MeiTY, through the IndiaAI Mission, is scaling this effort via AIKosha, India’s official AI repository. AIKosha hosts BharatGen’s newly released speech models across 19 Indian language variations, including speaker-adaptive and high-fidelity text-to-speech systems for languages like Hindi, Tamil, Marathi and Bengali.

A major milestone was the release of BharatGen Param 1 Indic Scale, a 2.9 billion parameter bilingual LLM built from scratch, featuring 25 percent Indic data—far more than the 0.01 percent typically found in global models like Meta’s LLaMA.

Also read: Aravind Srinivas' Perplexity AI is changing the way we search for information online

“Pre-training is an enormous undertaking and often an insurmountable barrier for many. That’s why we’ve taken on this challenge—to provide a robust foundation that you can easily fine-tune for your specific applications,” BharatGen stated in a LinkedIn post.

This ecosystem of support spanning funding, compute infrastructure and strategic alignment—has enabled BharatGen to release open-source tools that empower researchers, startups, and developers to build India-specific AI applications.

Looking ahead, BharatGen is poised to become a cornerstone of India’s digital public infrastructure. Over the next five years, the initiative aims to transform sectors like agriculture, education, health care, and law by integrating AI models tailored to India’s unique needs. These models will also power multilingual chatbots and digital assistants, enabling seamless interaction with citizens in their native languages, making government services more accessible and efficient.

“We envision BharatGen as a foundation for open, inclusive AI—supporting public data lakes and data sovereignty through Bharat Data Sagar and AIKosha. By fostering responsible innovation, we aim to empower startups, researchers and public institutions to build solutions that truly reflect India’s diversity,” says Ramakrishnan. By emphasising open-source development and national capacity-building, BharatGen is helping shape a self-reliant and globally competitive AI future for India.

Last Updated :

June 11, 25 10:54:17 AM IST