To train a machine learning model to respond to a pattern like the perceived gender of a face, one approach is to provide labeled training data. The labels represent what machine learning researchers call “ground truth.” Truth as a concept is slippery. For thousands of years humans have debated what is true, whether in courts, philosophers’ chairs, labs, political rallies, public forums, the playground, or when looking into mirrors—“Objects are closer than they appear.”
Scientists have argued for objective truth that is uncovered through experimentation, yet science does not escape human bias and prejudice. Feminist scholars have long pointed out how Western ways of knowing, shaped by patriarchy, attempt to erase the standpoint of the observer, taking a godlike, omniscient, and detached view. However, our standpoint, where we are positioned in society, and our cultural and social experiences shape how we share and interpret our observations. Acknowledging that there is subjectivity to perceived truths brings some humility to observations and the notion of partial truths. The elephant can be perceived as many things depending on whether you touch the tail, the leg, or the trunk.
This is not to say all interpretations are valid, particularly when looking at physical phenomena. Regardless of your acceptance of physical laws, gravity and the knowledge engineers have gained about aerodynamics influence how we build airplanes. In the world of machine learning, the arbiters of ground truth—what a model is taught to be the correct classification of a certain type of data—are those who decide which labels to apply and those who are tasked with applying those labels to data. Both groups bring their own standpoint and understanding to the process. Both groups are exercising the power to decide. Decision-making power is ultimately what defines ground truth. Human decisions are subjective.
The classification systems I or other machine learning practitioners select, modify, inherit, or expand to label a dataset are a reflection of subjective goals, observations, and understandings of the world. These systems of labeling circumscribe the world of possibilities and experience for a machine learning model, which is also limited by the data available. For example, if you decide to use binary gender labels—male and female—and use them on a dataset that includes only the faces of middle-aged white actors, the system is precluded from learning about intersex, trans, or nonbinary representations and will be less equipped to handle faces that fall outside its initial binary training set. The classification system erases the existence of those groups not included in it. It can also reify the groups so that if the most dominant classification of gender is presented in the binary male and female categorization, over time that binary categorization becomes accepted as “truth.” This “truth” ignores rich histories and observations from all over the world regarding gender that acknowledge third-gender individuals or more fluid gender relationships.
When it comes to gender classification systems, the gender labels being used make an inference about gender identity, how an individual interprets their own gender in the world. A computer vision system cannot observe how someone thinks about their gender, because the system is presented only with image data. It’s also true that how someone identifies with gender can change over time. In computer vision that uses machine learning, what machines are being exposed to is gender presentation, how an individual performs their gender in the way they dress, style their hair, and more. Presented with examples of images that are labeled to show what is perceived as male and as female, systems are exposed to cultural norms of gender presentation that can be reflected in length of hair, clothing, and accessories.
Some systems use geometric-based approaches, not appearance-based approaches, and have been programmed based on the physical dimensions of a human face. The scientific evidence shows how sex hormones can influence the shape of a face. Testosterone is observed to lead to a broader nose and forehead—but there are other factors that may lead to a particular nose or forehead shape, so what may be true for faces in a dataset of parliamentarians for Iceland does not necessarily apply to a set of actual faces from Senegal. Also, over time the use of geometric approaches for analyzing faces has been shown to be less effective than the appearance-based models that are learned from large labeled datasets. Coding all the rules for when a nose-to-eye-to-mouth ratio might be that of someone perceived as a woman or biologically female is a daunting task, so the machine learning approach has taken over. But this reliance on labeled data introduces its own challenges.
The representation of a concept like gender is constrained by both the classification system that is used and the data that is used to represent different groups within the classification. If we create a dataset to train a system on binary gender classification that includes only the faces of middle-aged white actors, that model is destined to struggle with gendering faces that do not resemble those in the training set. In the world of computer vision, we find that systems trained on adult faces often struggle with the faces of children, which are changing at a rapid pace as they grow and are often absent from face datasets.
The point remains: For machine learning models data is destiny, because the data provides the model with the representation of the world as curated by the makers of the system. Just as the kinds of labels that are chosen reflect human decisions, the kind of data that is made available is also a reflection of those who have the power to collect and decide which data is used to train a system. The data that is most readily available often is used out of convenience. It is convenient for Facebook to use data made available through user uploads. It is convenient for researchers to scrape the internet for data that is publicly posted. Google and Apple rely on the use of their products to amass extremely valuable datasets, such as voice data that can be collected when a user speaks to the phone to do a search. When ground truth is shaped by convenience sampling, grabbing what is most readily available and applying labels in a subjective manner, it represents the standpoint of the makers of the system, not a standalone objective truth.
A major part of my work is to dissect AI systems and show precisely how they can become biased. My early explorations taught me the importance of going beyond technical knowledge, valuing cultural knowledge, and questioning my own assumptions. We cannot assume that just because something is data driven or processed by an algorithm it is immune to bias. Labels and categories we may take for granted need to be interrogated. The more we know about the histories of racial categorization, the more we learn about how a variety of cultures approach gender, the easier it is to see the human touch that shapes AI systems. Instead of erasing our fingerprints from the creation of algorithmic systems, exposing them more clearly gives us a better understanding of what can go wrong, for whom, and why. AI reflects both our aspirations and our limitations. Our human limitations provide ample reasons for humility about the capabilities of the AI systems we create. Algorithmic justice necessitates questioning the arbiters of truth, because those with the power to build AI systems do not have a monopoly on truth.
Excerpted from the book UNMASKING AI: My Mission to Protect What Is Human in a World of Machines by Joy Buolamwini. Copyright © 2023 by Joy Buolamwini. Published by Random House, an imprint and division of Penguin Random House LLC. All rights reserved.
Dr. Joy Buolamwini is the founder of the Algorithmic Justice League, a groundbreaking researcher, and a renowned speaker. Her writing has been featured in publications such as Time, the New York Times, Harvard Business Review, and The Atlantic. As the Poet of Code, she creates art to illuminate the impact of artificial intelligence on society and advises world leaders on preventing AI harms. She is the recipient of numerous awards, and her MIT research on facial recognition technologies is featured in the Emmy-nominated documentary Coded Bias.
K. Whiteford is an emerging AI artist located in Maryland. She seamlessly fuses absurd surrealism with whimsical beauty in her AI art creations. Through her Instagram platform, she documents her captivating journey in the realm of artificial intelligence artistry.
Get the latest news and stories from the Rubin, plus occasional information on how to support our work.