Microsoft India's Kalika Bali: Building AI to empower India's underserved communities
At Microsoft Research India, Kalika Bali has been building inclusive, multilingual and culturally contextual AI systems that empower the most vulnerable in India
Kalika Bali, Senior principal researcher, Microsoft Research India
Image: Nishant Ratnakar for Forbes India
When a sentence like, ‘John and Mary have a keylime pie that they need to divide between themselves and Peter’ is translated into Hindi on any large language model (LLM), it results in: ’John और Mary के पास एक की लाइम पाई है, जिसे उन्हें अपने और Peter के बीच बाँटना है.’But how many people in India would know what a ‘keylime pie’ is? The problem here is not just translation—it’s also the loss of contextual meaning. The sentence hinges on the fact that keylime pie is round, an essential detail for solving the math problem.
Recognising this, Kalika Bali, senior principal researcher at Microsoft Research India (MSR), and her team have been working to make AI more culturally adaptive, ensuring language models account for regional nuances across the globe.
“Bias does not translate,” says Bali, who has been working at the intersection of language, technology and empowerment, particularly in the context of India’s rich linguistic diversity. Bali, who advocates for India to develop its own large language models (LLMs), is trying to shape the future of AI to be inclusive, ethical and human—where language is not just data, but also about identity, culture and access.
Bali’s journey into the world of AI began not with code, but with waveforms and resonances. Trained in acoustic phonetics, she initially worked on speech technology, building text-to-speech systems at a time when linguists were essential to language tech development. Her transition into Natural Language Processing (NLP) was organic—driven by the need to process language for speech systems. “I evolved as the technology evolved, since that’s where all the exciting work was happening,” she says.
Her early work in London-based startups focussed on English speech applications. But it was the realisation that such technologies were “nice-to-have” in the West but “need-to-have” in India that brought her back home in 2002. She joined Bengaluru-based PicoPeta Simputers, and later HP Labs in Bengaluru, to build voice interfaces for Indian languages. In August 2006, she joined the newly formed Microsoft Research lab in India.
Shaping AI for Inclusion
Initially Bali and her team focussed on building NLP standards that are used to build language technology—something that never existed. “These were taken up by the Bureau of Indian standards. Some of our work on Hindi transliteration went into the initial versions of Hindi-English translation systems in Bing, among many others,” says Bali.
One of Bali’s most influential contributions has been in the area of code mixing and code switching—the natural way multilingual speakers blend languages in speech and text. At a time when most NLP systems were monolingual, her team at MSR launched Project Melange, which helped establish code-mixed NLP as a legitimate and vital research area.
The fact that inclusion is important to Bali is evident in the projects at MSR. Take, for instance, her work with low-resource languages like Gondi. Through Project ELLORA, “We tied up with Pratham books, and they used a tool that we developed to translate English and Hindi books to Gondi. For the first time there were 200 books in the language,” says Bali.
“Kalika has played a pivotal role in shaping the AI ecosystem in India, driving a transformative shift toward inclusion and community-centered research. Over the past eight years of working with her, I have been inspired by her rare combination of moral clarity and scientific brilliance. Her leadership not only advances the frontiers of AI but also ensures that its benefits reach the most underserved,” says Manu Chopra, CEO, Karya.
Karya, which aims to enable supplemental income opportunities for people in low-income and marginalised communities by connecting them to AI-enabled digital work, started as a project under MSR, and was later spun off into a standalone organisation.
More recently, her team has been looking at ways to evaluate AI systems. “It’s not easy, because the benchmarks that are currently there are not very effective. Most of the systems have already consumed the benchmark data. How can we evaluate systems by involving speakers or community members to provide data and validate its correctness? Based on this human feedback, we could then train or prompt another LLM to generate similar data, which can be used as a benchmark for evaluation,” she says.
One early version of this approach was Pariksha—a pilot initiative focussed on evaluating Indic LLMs using a smaller subset of Indian languages and limited domains. Building on its learnings, the team is now developing a scaled-up version, called Samiksha, which will extend coverage to all 22 constitutionally recognised official Indian languages across domains.
She also contributed to MSR’s Project VeLLM, an initiative focussed on developing equitable, inclusive, and accessible real-world applications of generative AI by enabling culturally and linguistically diverse communities—especially those underserved or marginalised. From this, she helped develop Shiksha Copilot, an AI assistant designed to empower teachers by helping them create personalised lesson plans and curate educational resources. Originally part of VeLLM, Shiksha copilot too has now become an independent project, advancing AI-driven learning solutions.