Unified data stack: The missing link in India’s AI ambitions

India has data at scale, but it is not integrated. A unified data stack could be the only way out to achieve successfully the country’s quest for AI sovereignty

By
Last Updated: Jan 02, 2026, 12:58 IST3 min
Prefer us on Google
A unified data stack would act as the connective layer between India’s public digital infrastructure and its AI ambitions.
Photo by: Shutterstock
A unified data stack would act as the connective layer...
Advertisement

India’s next big leap in AI will come not from GPUs or algorithms—but from how it unifies its data.

Advertisement

India’s ambition to build home-grown large language models (LLMs) is entering a decisive phase. With the IndiaAI Mission approved by the Cabinet in 2024, and initiatives such as AI Kosh (IndiaAI Datasets Platform) launched to democratise access to datasets, the country’s potential is vast.

Yet amid the focus on model architectures and GPU clusters, a quieter challenge threatens to slow the revolution — India’s fragmented data ecosystem. If LLMs are the engines of intelligence, data is the fuel. And while India has plenty of it, that fuel is scattered, inconsistent and under-curated.

India’s Data: Abundant but Disconnected

India’s digital public infrastructure—built through platforms like CoWIN, Ayushman Bharat Digital Mission (ABDM), Open Network for Digital Commerce (ONDC) and DigiLocker—is globally recognised. Each has digitised millions of records across sectors.

However, many operate as vertical silos, with differing metadata, consent and access frameworks. The National Data Governance Framework Policy (Draft, May 2022), issued by the Ministry of Electronics and Information Technology (MeitY), recognised this explicitly — noting that government data “is currently managed… in differing and inconsistent ways” across agencies.

Consider healthcare: ABDM has digitised over 300 million health records, yet their interoperability across states remains limited. A unified data backbone could help securely share such data—enabling AI for diagnostics, disease surveillance and health copilots suited to Indian realities.

Why India Needs a Unified Data Stack

A unified data stack would act as the connective layer between India’s public digital infrastructure and its AI ambitions. Not another portal or warehouse, but a framework that standardises how data is collected, described, governed and shared.

India is taking steps in this direction. The IndiaAI Datasets Platform aims to provide AI-ready datasets to startups and researchers.

Advertisement

Key components include:

  • National Data Catalogue and Ontology: Common metadata standards across government, enterprise and research
  • Federated Data Network: Linking structured and unstructured datasets without centralising them, preserving sovereignty while enabling collaboration
  • Governance Plane: Embedding privacy, consent and localisation controls under the Digital Personal Data Protection Act, 2023
  • Annotation and Curation Pipelines: Leveraging India’s linguistic diversity and digital workforce to improve dataset quality
  • API-based Access Frameworks: Enabling startups, academia and industry to use and contribute shared data responsibly

Also Read: Bridging the AI trust deficit

From Data Sovereignty to Model Sovereignty

Data localisation is only the beginning. True data sovereignty means controlling how data flows, evolves and informs AI systems. A unified stack enables federated learning, allowing institutions to train models collaboratively without moving raw data. It also supports full data lineage—essential for transparency, bias audits and regulatory trust.

Global Lessons: US Openness and China’s Coordination

The United States illustrates how openness and shared data access accelerate innovation. Platforms such as Data.gov host hundreds of thousands of datasets with standard metadata, promoting transparency and reuse.

By contrast, China shows the power of coordination and governance in its national data efforts—for example, through the National AI Data Resource Center and municipal data exchanges. India can blend these strengths—US-style openness plus China-style coordination—designing a model that is democratic in spirit, disciplined in execution and uniquely Indian in design.

From Data to Knowledge Infrastructure

A unified data stack is not simply a technology upgrade—it is a national capability for knowledge creation, innovation and inclusion. It will power multilingual LLMs, AI copilots in governance and enterprise, and unlock startup innovation built on trusted public datasets. A coherent data architecture will embed trust and traceability into India’s AI value chain.

Read More

This stack should now stand alongside semiconductor and compute policies as a core pillar of the IndiaAI Mission. The sooner India unifies its data, the faster it will unify its intelligence—moving from being merely data-rich to truly knowledge-sovereign.

Alexy Thomas, Partner, Technology Consulting, EY India

Advertisement

First Published: Jan 02, 2026, 13:08

Subscribe Now
Advertisement