Why current tech stacks will fail DPDP (And what needs to change)
Organizations must decide how they want to handle data going forward—and that decision touches every corner of the business.


For more than a decade, businesses have invested heavily in building systems that capture everything, store everything, and remember everything. Whether you run a bank, a marketplace, or a consumer app, the philosophy has been the same: more data means more insight, personalization, and competitive advantage. Teams were encouraged to collect broadly and store indefinitely because the trade-off seemed obvious: the more you knew about your users, the better you could serve them.
Then the DPDP Act arrives and essentially says: “All that data you’ve been accumulating? You now need to erase it completely, provably, and on demand.” What looked like a strategic asset suddenly becomes a compliance liability. This is a fundamental mismatch between how modern enterprises have been engineered and what modern regulation now requires. The “Right to Erasure” may read like a simple obligation, but operationally it is anything but. And unless business leaders grasp why, they will underestimate the true cost and complexity of implementation.
The new DPDP law is explicit that data fiduciaries must provably delete personal data when the purpose is fulfilled or when a data principal requests erasure. Failure to do so can trigger some of the Act’s highest penalties, including fines up to ₹250 crore for non-compliance with data deletion and retention obligations.
If this “Ghost Data” lingers anywhere in the stack, the organization is automatically out of compliance, even if the primary database was scrubbed. And it often remains hidden in places teams rarely think to check; Snowflake or BigQuery partitions, even Slack messages shared between developers. Few organizations can reliably map all this out, and it’s never for the lack of trying.
In this model, analytics tools, support platforms, AI pipelines, and internal services all operate without holding any real personal data. Tokenization by itself isn’t enough though; what makes this durable is pairing it with encryption and centralized key control, so that identity resolution happens only in one place: a data privacy vault. The enterprise shifts from a copy-based architecture to a reference-based one, reducing exposure and simplifying compliance.
Tokenization flips that entirely. Instead of giving everyone a copy of the diary, you give them a reference tag: a token that behaves like the original entry for their workflows, but doesn’t contain a single word of the actual text, encrypted or otherwise. The real diary stays in one place (the aforementioned data privacy vault), encrypted and tightly access-controlled. Teams and systems get what they need to operate, but no one outside the vault ever holds the real writing.
Importantly for AI systems, this protects downstream models as well. Training an LLM or a RAG pipeline on raw PII is a permanent mistake. Once a model internalizes someone’s name or identifier, you can’t meaningfully “delete” it. With a reference-based architecture, the model only ever sees tokens. If a user invokes their Right to Erasure, you don’t chase every photocopy; you simply destroy the relevant contents within the vault. The AI’s inputs become instantly privacy-safe, and every ciphertext reference becomes functionally useless, even if tokens exist across many systems.
True compliance requires architectural change. Organizations must decide how they want to handle data going forward—and that decision touches every corner of the business. The ability to personalize and build AI responsibly depends on controlling data lifecycles. Operational costs rise or fall based on how much data sprawl exists. Vendor ecosystems must be evaluated through a privacy-first lens. And risk posture shifts dramatically when deletion becomes provable rather than “we tried our best.”
And importantly, leaders must recognize that privacy-first architecture isn’t a slowdown—it’s how you future-proof the business. As AI becomes more deeply embedded in every workflow, the companies that can trust the integrity, lineage, and legality of their data will be the only ones able to scale confidently. Clean, well-governed data is great for compliance, but it’s also the raw material every AI-driven enterprise depends on. Those who treat it that way will pull ahead.
That’s where architecture does the heavy lifting. The only way to shrink exposure, simplify governance, and make deletion real rather than aspirational is to redesign how identity travels inside the business. Centralizing sensitive data in a vault, pushing tokens outward, and allowing every downstream system, including AI, to operate on reversible references creates an environment where change is possible without incurring permanent risk. It’s not just about solving for today’s deletion requests; it’s about ensuring tomorrow’s models, workflows, and automations don’t inherit yesterday’s mistakes.
Privacy is finally acting as a forcing function for engineering discipline. The shift isn’t cosmetic. It requires rethinking the foundation so the business can move faster, scale AI safely, and evolve without dragging a decade of accumulated data debt behind it. The systems we built were designed to remember. The systems we need now are the ones that can let go.
Roshmik Saha is a Co-founder & CTO of Skyflow.
First Published: Feb 05, 2026, 17:10
Subscribe Now