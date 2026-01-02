India’s next big leap in AI will come not from GPUs or algorithms—but from how it unifies its data.

India’s ambition to build home-grown large language models (LLMs) is entering a decisive phase. With the IndiaAI Mission approved by the Cabinet in 2024, and initiatives such as AI Kosh (IndiaAI Datasets Platform) launched to democratise access to datasets, the country’s potential is vast.

Yet amid the focus on model architectures and GPU clusters, a quieter challenge threatens to slow the revolution — India’s fragmented data ecosystem. If LLMs are the engines of intelligence, data is the fuel. And while India has plenty of it, that fuel is scattered, inconsistent and under-curated.

India’s Data: Abundant but Disconnected

India’s digital public infrastructure—built through platforms like CoWIN, Ayushman Bharat Digital Mission (ABDM), Open Network for Digital Commerce (ONDC) and DigiLocker—is globally recognised. Each has digitised millions of records across sectors.

