Unified data stack: The missing link in India’s AI ambitions
India has data at scale, but it is not integrated. A unified data stack could be the only way out to achieve successfully the country’s quest for AI sovereignty


India’s next big leap in AI will come not from GPUs or algorithms—but from how it unifies its data.
India’s ambition to build home-grown large language models (LLMs) is entering a decisive phase. With the IndiaAI Mission approved by the Cabinet in 2024, and initiatives such as AI Kosh (IndiaAI Datasets Platform) launched to democratise access to datasets, the country’s potential is vast.
Yet amid the focus on model architectures and GPU clusters, a quieter challenge threatens to slow the revolution — India’s fragmented data ecosystem. If LLMs are the engines of intelligence, data is the fuel. And while India has plenty of it, that fuel is scattered, inconsistent and under-curated.
India’s digital public infrastructure—built through platforms like CoWIN, Ayushman Bharat Digital Mission (ABDM), Open Network for Digital Commerce (ONDC) and DigiLocker—is globally recognised. Each has digitised millions of records across sectors.
However, many operate as vertical silos, with differing metadata, consent and access frameworks. The National Data Governance Framework Policy (Draft, May 2022), issued by the Ministry of Electronics and Information Technology (MeitY), recognised this explicitly — noting that government data “is currently managed… in differing and inconsistent ways” across agencies.
Consider healthcare: ABDM has digitised over 300 million health records, yet their interoperability across states remains limited. A unified data backbone could help securely share such data—enabling AI for diagnostics, disease surveillance and health copilots suited to Indian realities.
A unified data stack would act as the connective layer between India’s public digital infrastructure and its AI ambitions. Not another portal or warehouse, but a framework that standardises how data is collected, described, governed and shared.
India is taking steps in this direction. The IndiaAI Datasets Platform aims to provide AI-ready datasets to startups and researchers.
Key components include:
Also Read: Bridging the AI trust deficit
Data localisation is only the beginning. True data sovereignty means controlling how data flows, evolves and informs AI systems. A unified stack enables federated learning, allowing institutions to train models collaboratively without moving raw data. It also supports full data lineage—essential for transparency, bias audits and regulatory trust.
The United States illustrates how openness and shared data access accelerate innovation. Platforms such as Data.gov host hundreds of thousands of datasets with standard metadata, promoting transparency and reuse.
By contrast, China shows the power of coordination and governance in its national data efforts—for example, through the National AI Data Resource Center and municipal data exchanges. India can blend these strengths—US-style openness plus China-style coordination—designing a model that is democratic in spirit, disciplined in execution and uniquely Indian in design.
A unified data stack is not simply a technology upgrade—it is a national capability for knowledge creation, innovation and inclusion. It will power multilingual LLMs, AI copilots in governance and enterprise, and unlock startup innovation built on trusted public datasets. A coherent data architecture will embed trust and traceability into India’s AI value chain.
This stack should now stand alongside semiconductor and compute policies as a core pillar of the IndiaAI Mission. The sooner India unifies its data, the faster it will unify its intelligence—moving from being merely data-rich to truly knowledge-sovereign.
Alexy Thomas, Partner, Technology Consulting, EY India
First Published: Jan 02, 2026, 13:08
Subscribe Now