30 Indian Minds Leading the AI Revolution

Column: AI bias is not just a data problem

Implementing technical solutions to social problems does not hold us in good stead to understand fairness or mitigate bias

By Vidushi Marda

Published: Aug 10, 2021 02:29:56 PM IST

Delhi Police’s facial recognition tool does not ‘simply see’, but rather struggles to identify individuals, with an accuracy rate of less than 1 percent, as stated by the Centre before the Delhi High Court

Image: Shutterstock

In February 2020, a Union minister urged the Ministry of Home Affairs against the use of facial recognition technology on innocent people. Responding to this, Home Minister Amit Shah stated, “This is a software. It does not see faith. It does not see clothes. It only sees the face and through the face the person is caught.”

This statement’s connotation of an artificial intelligence (AI) system like facial recognition being a neutral tool runs contrary to much of what we have witnessed in recent years. For starters, it is well-documented that the Delhi Police’s facial recognition tool does not ‘simply see’, but rather struggles to identify individuals, with an accuracy rate of less than 1 percent, as stated by the Centre before the Delhi High Court.

Secondly, the legality and necessity of facial recognition’s use by Delhi Police has come under the scanner. Beyond that, as AI applications crop up across of sectors—from education to health care and law enforcement to lending—it is apparent that while states and companies promote AI through promises of modernisation, economic growth and efficiency, these systems can also lead to harmful outcomes that include violation of fundamental rights, disproportionate impacts on vulnerable communities and exacerbated historical discrimination in societies.

One popular explanation for these harmful outcomes is the presence of biases in AI systems. Machine learning, the most popular subset of AI techniques, depends on data to ‘learn’, that is, to uncover patterns, make classifications or predict future outcomes. The old adage of “garbage in, garbage out” is often used to summarise the core issue of AI bias—if data used to ‘teach’ the algorithm is biased, outputs from those algorithmic systems will be biased as well.

For instance, if a hiring algorithm is fed with data of past ‘successful’ candidates composed only of male applicants, the algorithm is likely to discriminate against female applicants, as evidenced by Amazon’s automated hiring tool in 2018. Anticipating such examples, India’s National Strategy for AI recognises that data may have biases, and recommends that actors “identify the in-built biases and assess their impact, and in turn find ways to reduce the bias”.

But bias in AI is superficially understood only as a ‘data’ problem that can be technically fixed. The reality is far more complex.

Zoom In on the Technical

Taking a techno-centric view of bias, underrepresenting certain groups in training datasets can lead to discriminatory outcomes as evidenced in the job application example above, and as ground-breaking research on facial recognition from Joy Buolimwini and Timnit Gebru has shown as well. Training data is used to ‘teach’ machine learning algorithms and what emerges through the process of training is known as a model.

Choices at the stage of modelling, including scrutinising its performance and outcomes, processing the model before implementing into a system’s wider decision-making process, and choosing what features of the data an algorithm will train on (a process called feature engineering) are significant points at which bias may be introduced into a system.

For instance, in building a machine learning system to assess creditworthiness of individuals, the choice of training data is significant, as is the choice of feature engineering—will the algorithm be trained on features that reveal sensitive data points such as religion or caste, in turn sabotaging the chances of people from these historically disadvantaged communities getting access to credit? Will irrelevant features be factored in? What qualifies as a model functioning well enough to be used for decision making? Who decides the threshold? Availability of data, existence of data pertaining to historically marginalised groups, evaluation of models, all embody bias of various kinds, as researchers Harini Suresh and John Guttag from the Massachusetts Institute of Technology have outlined.

Zoom Out to the Socio-technical

Even representative datasets that are carefully curated and modelled can result in discrimination and bias. Human actors that use these technologies may be biased at the time of implementing the output suggested by the system.

Similarly, even seemingly accurate or non-controversial technical systems can lead to discriminatory outcomes through the process of deployment. The use of facial recognition in Delhi is a good example. Momentarily pausing on the acute accuracy problem, even if these systems had high and near-perfect accuracy rates, their use would still result in discrimination if used by policing institutions that embed and perpetuate discriminatory approaches to policing.

Additionally, as I show in a 2021 Article 19 report ‘Emotional Entanglement’ written with Shazeda Ahmed, sometimes, the foundations on which certain technologies like emotion recognition are built are discriminatory in and of themselves, at which point bias, accuracy and performance in data and models seem almost irrelevant to the larger questions of discrimination at hand.

By trying to technically “fix” bias, we fail to engage with far more urgent questions related to AI in societies

This approach to understanding bias underscores the importance of situating AI systems in the ground reality and environment within which they are used. The process of bringing AI systems to deployment includes a number of assumptions and decisions that emerge from a profoundly unfair world steeped in discrimination of various kinds.

Interpretations are made by fallible humans who embody their unique set of biases and who cannot exhaustively imagine all the possible instances where bias may occur. By framing the problem of bias as one that can be fixed through technical means alone, we distract from much more complex questions of equality and structural discrimination in societies.

On ‘De-biasing’

Policy proposals and technical approaches sometimes prescribe “de-biasing” as a way of mitigating harms. While theoretically tempting, this is an oversimplification. The choice of standard that is considered “fair” during this process is a subjective and biased one. Further, bias in AI systems is layered, and de-biasing at best, provides a veneer of fairness to a complex sociotechnical system, by treating bias as a discrete issue that can be “fixed” as opposed to one that pervades the stages that precede, underpin and follow deployment of AI systems.

For instance, ‘de-biasing’ a facial recognition algorithm and making it more accurately perform on minority groups could actually spur greater discrimination and harmful outcomes given examples of real world uses across jurisdictions—bias is not black and white, and debiasing is not simply a switch that ensures fairness.

Importantly, outputs that are checked against one fairness metric or the other cannot guarantee that discriminatory outcomes will not occur, given the chain of interpretation and human judgement involved. Implementing technical solutions to social problems does not hold us in good stead to understand fairness or mitigate bias.

AI systems are deeply human throughout their lifecycle. By trying to technically “fix” bias, we fail to engage with far more urgent questions related to AI in societies. We need to think about how AI systems concentrate and shift power, to scrutinise their need in societies before we think about optimising their outputs, and reflect on the institutional, historical and societal legacies that pervade the deployment and use of AI systems. While data and models are an important consideration in understanding bias, they are only the tip of a complex, socio-technical iceberg.

(The writer is a lawyer and researcher based in Bengaluru. She studies the societal implications of AI)