All people are created equal, but in the eyes of the algorithm, not all faces are just yet.
A new study from MIT and Microsoft rearchers (pdf) is providing more evidence of exactly how bad facial-recognition software is at accurately identifying darker faces, especially when those faces belong to women. In a test to identify the sex of people from their faces, software was able to do so with more than 99% accuracy for light-skinned men. For darker-skinned women, the software could be wrong as frequently as one-third of the time.
The results shed more light on a known problem—how limited data sets can impact the effectiveness of artificial intelligence, which might in turn heighten bias against individuals as AI becomes more widespread.
In the paper, Joy Buolamwini of the MIT Media Lab and Timnit Gebru of Microsoft Research, discussed the results of a software evaluation carried out in April and May of last year. For it, they gathered a database of 1,270 faces, drawn from images of lawmakers in countries with high percentages of women in power. Three of the countries were in Africa, while three were Nordic countries.
The researchers then ran the images against facial recognition software from three providers—IBM, Microsoft, and China’s Megvii (identified in the paper by the name of its software, Face++)—to asses how accurately each recognized the gender of the person pictured. (The researchers said they worked with binary gender classifications because of limitations with the data they were working with.)
The team discovered that all three companies were more likely to correctly identify a subject as male or female if the subject had pale skin.
Facial recognition companies train their algorithms by exposing them to image databases full of faces. If the images in these databases are overwhelmingly white, the algorithms will likely identify the characteristics of a white face with more accuracy than a dark-skinned face. In the paper, the researchers name data sets—IJB-A and Adience—as examples of commonly-used image databases that contain a large majority of light-skinned faces. Faces of darker women made up only 4.4% of IJB-A images and 7.4% of subjects in Adience, according to the paper.
IBM responded to the study with a lengthy statement (.pdf) that the researchers shared. “For the past nine months, IBM has been working towards substantially increasing the accuracy of its new Watson Visual recognition for facial analysis, which now uses different training data and different recognition capabilities than the service evaluated in this study,” it said, adding, “To deal with possible sources of bias, we have several ongoing projects to address dataset bias in facial analysis – including not only gender and skin type, but also bias related to age groups, different ethnicities, and factors such as pose, illumination, resolution, expression, and decoration.”
The company said it tested the new software with a similarly mixed image set based on the one used by the researchers, and while this still showed a larger error rate for darker women, it had dropped to about 3.5%. “This reflects a nearly tenfold increase in accuracy.”
Microsoft also provided a statement (.pdf) to the researchers. “We believe the fairness of AI technologies is a critical issue for the industry and one that Microsoft takes very seriously. We’ve already taken steps to improve the accuracy of our facial recognition technology and we’re continuing to invest in research to recognize, understand and remove bias,” it reads. Megvii did not reply to the team when it shared its findings with it, and also didn’t immediately respond to a request from Quartz.
The researchers argue that these discrepancies could accelerate “algorithmic bias,” in that artificial intelligence services treat individuals differently based on factors such as skin color or gender. If police use facial recognition software to help them catch criminals, for example, imperfect recognition of certain groups of people could lead them to face more wrongful arrests.
“Automated systems are not inherently neutral. They reflect the priorities, preferences, and prejudices—the coded gaze—of those who have the power to mold artificial intelligence,” they write.
Of course, improved accuracy for facial recognition across all races could still perpetuate—or even heighten—algorithmic bias, especially in commercial or law enforcement applications. The more competently an AI can identify an individual, the more capable it is to then make a decision about what to do next, say what ads to show or what services not to offer. Those moves could themselves be grounded in bias.