Ganesh Ramakrishnan's BharatGen Vision: Inclusive, diverse, self-reliant

As the principal investigator of BharatGen—a government-backed effort, led by a consortium of top Indian institutions, to develop multilingual and multimodal generative AI models for India—Ramakrishna...

Last Updated: Jun 11, 2025, 10:47 IST3 min

Ganesh Ramakrishnan, Principal investigator, BharatGen...

In a world racing to build ever-larger AI models, Ganesh Ramakrishnan is forging a different path—one focussed on inclusion, linguistic diversity and national self-reliance. From the corridors of IIT-Bombay to the core of India’s AI mission, he has driven some of the country’s most ambitious language technology initiatives. As the principal investigator of BharatGen—a government-backed effort, led by a consortium of top Indian institutions, to develop multilingual and multimodal generative AI models for India—Ramakrishnan is championing accessibility and inclusion in AI. With a deep background in machine learning and natural language processing, he has spent over a decade building data-efficient and compute-efficient technologies that bridge linguistic gaps and democratise AI access. “Our goal is to create foundational AI models that are inclusive, data-efficient, and representative of India’s rich linguistic, cultural and societal diversity," says Ramakrishnan.

Related stories

Ramakrishnan and his team are tackling the unique challenges of Indian languages, including data scarcity, multilingual intricacies and computational limitations. “Harnessing the similarities across morphologically rich, code-switched, and highly diverse Indic languages presents significant hurdles in tokenisation and representation. However, through our collaborative consortium efforts, we have made substantial strides in overcoming these obstacles," he explains.

At the heart of BharatGen’s mission is the goal of supporting Indian languages with limited digital presence. “One of the key enablers of this initiative is Bharat Data Sagar, a system designed to collect, organise, and curate text, speech and visual data from a diverse range of sources," says Ramakrishnan. The team uses advanced techniques to make even small datasets useful, ensuring broad linguistic representation.

This work builds on earlier efforts like the Udaan Project, which focussed on machine translation for Indian languages. “It laid the groundwork for data curation pipelines in building OCR (Optical Character Recognition), ASR (Automatic Speech Recognition), and translation tools that now integrate into Bharat Data Sagar," he notes. Udaan’s insights into semantic equivalence across Indian languages have informed BharatGen’s multilingual model training strategies, while its open-source ethos has inspired BharatGen’s commitment to democratised access.View the full list here:

Ramakrishnan describes BharatGen’s partnership with government bodies like DST and Ministry of Electronics and IT (MeiTY) as “deeply collaborative and forward-looking". The Department of Science and Technology (DST), through its NM-ICPS programme, provided `235 crore in seed funding, enabling the development of India’s first foundational bilingual LLMs with 2 to 3 billion parameters.

MeiTY, through the IndiaAI Mission, is scaling this effort via AIKosha, India’s official AI repository. AIKosha hosts BharatGen’s newly released speech models across 19 Indian language variations, including speaker-adaptive and high-fidelity text-to-speech systems for languages like Hindi, Tamil, Marathi and Bengali.

A major milestone was the release of BharatGen Param 1 Indic Scale, a 2.9 billion parameter bilingual LLM built from scratch, featuring 25 percent Indic data—far more than the 0.01 percent typically found in global models like Meta’s LLaMA.

Lists

Multimedia

Thought Leadership

Ganesh Ramakrishnan's BharatGen Vision: Inclusive, diverse, self-reliant

As the principal investigator of BharatGen—a government-backed effort, led by a consortium of top Indian institutions, to develop multilingual and multimodal generative AI models for India—Ramakrishna...

Popular News

Latest News