Does data anonymization really hide your identity?

Despite firms' attempts to anonymize user data, identities can still be detected

Last Updated: Jul 08, 2022, 11:22 IST4 min

New

Data privacy are hard for consumers to understand, the onus may truly be on companies to develop better anonymization schemes Image: Shutterstock

Join Us On

Facebook Twitter Instagram Linkedin Spotify YouTube

Data is all around us, to the point where it can potentially be unnerving. But Jiaming Xu, assistant professor in decision sciences at Duke University’s Fuqua School of Business, stressed that although many people are concerned about data privacy, they also experience many benefits from the ways their data is used.

Data anonymization and signatures

Even with normal anonymization and sanitization where a user’s identity has been removed and some of their activity redacted, users have unique patterns of behavior and can still be re-identified from these signatures, Xu said.

A contest launched by the entertainment company Netflix offers a famous example of how easily users can be re-identified from information that doesn’t include their names, he said. In 2006, Netflix challenged contestants to create an algorithm that was 10 percent better than its current algorithm in predicting users’ movie ratings.

Here, the data was anonymized, but when compared with data from the popular movie-rating website IMDb.com, researchers could identify users based on how they rated the same movies across both websites, Xu said. Netflix ended up halting the challenge due to these privacy concerns.

Another example of a signature is the friends or connections a person has on social networks such as Facebook and LinkedIn. In other words, two users who share many common friends across various social media networks may actually be the same person, Xu explained.

This was the basis for a well-known case in which researchers were able to use this signature to correctly identify about 30 percent of people with profiles on Twitter and the social network Flickr based on their direct connections, Xu noted.

However, simply comparing a person’s popularity, or number of connections, may not be the most reliable method, Xu said. In his own research, Xu suggests this technique could go even further in confirming a person’s identity by not only looking at their direct connections, but also examining the popularity of those direct connections. Using this method could re-identify users from anonymized data even if the connections across a person’s social media networks vary by up to 11 percent, Xu’s research found.

What’s next in data privacy?

There are measures consumers can take to protect their identities, such as being cognizant of the information they post across platforms or using pseudonyms, Xu said. But because the nuances of data privacy are hard for consumers to understand, the onus may truly be on companies to develop better anonymization schemes.

While most of the scenarios Xu discussed concern network privacy specifically, companies such as Apple and Google are trying new methods to protect user data including differential privacy, he said. With this strategy, firms may share information about groups in a dataset rather than releasing information about any individuals.

Does data anonymization really hide your identity?

Despite firms' attempts to anonymize user data, identities can still be detected

Data anonymization and signatures

What’s next in data privacy?

Latest News