30 Indian Minds Leading the AI Revolution

Connecting the dots with big data

Should numbers speak louder than human qualities?

By Anna Rohleder

Published: Jul 30, 2016 06:42:16 AM IST

There’s a scene early on in the short film The Quantified Self (2016) where a husband walks into the bedroom and asks his wife how she slept.

“I don’t know,” she replies, lifting her arm drowsily to show him a blinking device attached to her finger.

He peers at it and then tells her, “You slept well.”

The film, which debuted at the Atlanta Film Festival earlier this year, imagines a dystopian version of the so-called “quantified self” movement, in which people use wearable devices like the Apple Watch or Fitbit to track their physical activity and vital signs. Rather than just using this information to improve their health or fitness, the characters use their bio-data to live life by the numbers. It determines what they eat and when, the work they do and the games they play—even the decision to have another child.

The film asks what it means to listen to data, down to the literal level. Romain Collin, composer of The Quantified Self’s score, says, “I made a soundtrack that is very bare—it’s only two drums carrying the story, like the zeros and ones of the digital world this family is trapped in. The score feels menacing and minimalist, and illustrates the overly simplistic and alienating aspect of trying to code the world and people into data.”

But it’s not just happening in the movie: More and more of our own lives are becoming datafied. Beyond the things that we’re used to counting and measuring, like the inputs and outputs of our economic activity or even contemporary metrics of personal success like numbers of friends, datafication increasingly extends to our thoughts and feelings too. Today, software applications known as sentiment analysis or emotional analytics that mine the data of tweets, likes, and emails are gaining traction for everything from marketing to fraud detection to interaction design. The age of Big Data is upon us. And while the technological effects have already been profound, some see an even bigger historical shift. Before, we turned the world into information to understand it. But once everything becomes information, how do we understand our world—and our own place in it?

Where Big Data came from—and where it’s going
Over the past few decades, three trends have converged to create the Big Data phenomenon. First, more data is being generated and captured than ever before: From business supply chains to individual shopping preferences, from infectious disease outbreaks to the shares racked up by a viral video, if something matters in business or society today, it will be quantified and turned into a data point. Second, the processing power of computers and digital storage capacity have grown exponentially, which not only allows all that data to be kept handy, but also for it to be monetised, as when social media companies sell their users’ personal information to advertisers. And third, an assumption has been made across industries, government and research that big ideas like driverless cars and genetically tailored medicine can—and should—be cracked with the help of big data.

Yet there are also risks. Privacy, both online and off, has already begun to erode under the steady onslaught of tracking devices and surveillance technologies that collect data from every corner of our lives, often without our consent or awareness. Another danger lies in data-based profiling, i.e. making assumptions about people’s behaviour based on inferences from the past, or on others who are statistically similar. But the biggest impact of all, as Viktor Mayer-Schönberger and Kenneth Cukier write in their 2013 bestseller, Big Data: A Revolution That Will Transform How We Live, Work and Think, is “that data-driven decisions are poised to augment or overrule human judgement.” In other words, in listening to data, there is a risk that we not only let it speak, but that we will let it speak for us.

Seeing through the prism of probabilities
Almost anytime a computer seems to “know” what we want, whether we misspell a word but still get the search results we were looking for, or a website’s recommendations supply our next favourite book or movie, it is thanks to Big Data. In essence, Big Data is about applying algorithms to very large datasets to identify patterns and make extrapolations. Given enough information, correlations can be discovered between different factors that lead to new insights or predictions. Machine translation is a good example of this. Google Translate was created by performing statistical analysis on more than a trillion words —a substantial portion of the content in every language on the internet—and finding probabilities that certain words and phrases in one language would correspond to those in another. And it worked. Although the Google Translate service does not “understand” the languages it is translating—rules of grammar are not part of the algorithm —it can identify equivalents between words based on the statistical likelihood of where and how they occur in text and speech.

This ability of Big Data to provide knowledge without understanding, as it were, underlies both its promises and perils. On the positive side, data-based inferences can provide a useful corrective for flawed human thinking, particularly our bias toward seeing the world in terms of cause and effect. “If I had dinner and had an upset stomach, I would think it was what I ate, although statistically speaking, it’s far more likely I’d get bugs from shaking hands with someone,” explains Big Data co-author Mayer-Schönberger. “I see the world through my own eyes, through my experiences, preferences, beliefs, ideologies, and prejudices—all I’m trying to do is verify that my beliefs are correct.” Data does not have anything to prove. It does not come with a story. Correlations based on crunching the data have resulted in improvements for everything from medical diagnoses to equipment maintenance scheduling; saving time, money, and in some cases, also lives.

Yet correlations can also be problematic. Because they do not say that B happened because of A, only that A and B were observed together, correlations can give rise to spurious conclusions. For example, observing that the number of Japanese cars sold in the US corresponds almost exactly to the number of suicides caused by crashing a car does not mean car-based suicides could be prevented if fewer Japanese cars were imported. This points to the need for a person—ideally a knowledgeable one—to interpret the correlations an algorithm may come up with. “To have data tell us something without expertise is a false hope,” says Dr Art Markman, a professor of cognitive psychology at the University of Texas, Austin, who has done extensive research into decision-making. “All else being equal, the statistical models based on data do better than humans in complex situations, and a human using the output of a statistical model and then making a decision based on that does better than either alone.”

When it comes to decisions about people rather than just by people, however, a different set of issues arises. In a dataset, there is quite literally no place for the individual. Algorithms assign people to categories and classifications based on factors like where they live and work, the things they buy, read and watch, and the contacts in their social networks. This form of evaluating people based on a correlation with others who are statistically similar can lead to what some data rights-advocates have called “the networked nature of algorithmic discrimination”. In a paper with this title, researchers
danah boyd, Karen Levy and Alice Marwick write, “Racism, sexism, and other forms of bigotry and prejudice are still pervasive in contemporary society, but new technologies have a tendency to obscure the ways in which societal biases are baked into algorithmic decision-making.” Even the employees in a given HR department or bank could likely not explain to someone why their resume was not considered for a job or why they were denied a loan or credit card, because those kinds of decisions have been largely automated on the basis of formulas that computer scientists themselves find difficult to understand.

Taking the human element out of the equation
Apart from being unable to account for individual differences, data-based decisions also cannot extrapolate conditions for the way we experience life. “What [statistical] models can predict are outcomes that can be quantified,” says Markman. “For example, you might choose a career based on projections of future demand for that job and salary, but then you might not be happy in that job. By relying entirely on data, you can lead yourself to optimise the wrong thing.” Businesses regularly make this kind of mistake, he notes, when they implement measures aimed at cutting costs or boosting production that depress employee morale, while the long-term negative effects may remain hidden because the company’s performance is only measured in economic terms.

These are issues today. Silencing human factors like empathy, trust or a sense of fairness to defer to what the numbers say is, by itself, nothing new in our companies, institutions or governments. But there is an even more worrisome future scenario particular to Big Data. It goes beyond questions of ordinary ethics, or the individual’s place in society.

Since only the statistical probability of behaviour can be shown by Big Data, not what one person may or may not do in the moment, this gets to the heart of how we understand ourselves as human beings with free will. Data analytics-based law enforcement methods known as predictive policing and crime forecasting are already in use in several states in the US and some cities in the UK. They are advertised as a way to “stop crime before it starts”. What this really means, however, is that someone could be arrested or imprisoned for their propensity to commit a crime rather than the act of actually doing it. “Then the whole idea of guilt is ridiculous,” says Mayer-Schönberger. “You become just a risk-averse society trying to prevent behaviour from happening whether or not someone is responsible. That is a world that is not only full of punishment and stopping people from doing something they haven’t done, but a world where there is no good basis for human relationships, because human relationships depend on indebtedness to each other.”

If a quality like personal responsibility can’t be quantified, how can it be factored into our future deliberations? The use of Big Data raises big questions, and not only about making meaning out of correlations. Perhaps the most vexing conundrum we may be faced with is deciding how to decide, or like the family in The Quantified Self, whether to do so at all. Which of our choices will we leave up to the data?

(This story appears in the July-Aug 2016 issue of ForbesLife India. To visit our Archives, click here.)