Facial recognition AI can’t identify trans and non-binary people

A staff member has his face scanned at an elevator during a demonstration to the media at Alibaba’s FlyZoo hotel in China.
A staff member has his face scanned at an elevator during a demonstration to the media at Alibaba’s FlyZoo hotel in China.
Image: Reuters/Xihao Jiang
We may earn a commission from links on this page.

Facial-recognition software from major tech companies is apparently ill-equipped to work on transgender and non-binary people, according to new research. A recent study by computer-science researchers at the University of Colorado Boulder found that major AI-based facial analysis tools—including Amazon’s Rekognition, IBM’s Watson, Microsoft’s Azure, and Clarifai—habitually misidentified non-cisgender people.

The researchers gathered 2,450 images of faces from Instagram, searching under the hashtags #woman, #man, #transwoman, #transman, #agenderqueer, and #nonbinary. They eliminated instances in which multiple individuals were in the photo, or where at least 75% of the person’s face wasn’t visible. The images were then divided by hashtag, amounting to 350 images in each group. Scientists then tested each group against the facial analysis tools of the four companies. 

The systems were most accurate with cisgender men and women, who on average were accurately classified 98% of the time. Researchers found that trans men were wrongly categorized roughly 30% of the time. The tools fared far worse with non-binary or genderqueer people, inaccurately classifying them in all instances. 

The rising use of facial recognition by law enforcement, immigration services, banks, and other institutions has provoked fears that such tools will be used to cause harm. There’s a growing body of evidence that the nascent technology struggles with both racial and gender bias. A January study from the MIT Media Lab found that Amazon’s Rekognition tool misidentified darker-skinned women as men one-third of the time. The software even mislabeled white women as men at higher rates than white men. While IBM and Microsoft’s programs were found to be more accurate than Amazon’s, researchers observed an overall trend of male subjects being labeled correctly more than female subjects, and of darker skin drawing higher error rates than lighter skin.

At present, there’s very little research on how facial analysis tools work with gender non-conforming individuals. “We knew that people of minoritized gender identities—so people who are trans, people who are non-binary—were very concerned about this technology, but we didn’t actually have any empirical evidence about the misclassification rates for that group of people,” Morgan Klaus Scheuerman, a doctoral student in the information-science department of the University of Colorado Boulder, said in a video about the study.

The researchers believe that the algorithms rely on outdated stereotypes on gender, which further increases their error rates. Half of the systems misclassified Scheuerman, who is male and has long hair, as a woman. Such inconsistencies were observed across the board. For example, IBM’s Watson classified a photo of a man dressed in drag as female, while Microsoft’s Azure classified him as male. 

The four companies whose products were tested have yet to comment on the study’s findings. Quartz reached out to them for comment and will update as necessary.

In an update made to its website in September, Amazon noted that Rekognition isn’t designed to “categorize a person’s gender identity”, and the tool shouldn’t be used to make such a determination. The company states that its tool is best-suited for cases that don’t involve specific users. “For example, the percentage of female users compared to male users on a social media platform,” Amazon writes. 

Scheuerman said that while he believed Amazon’s guidelines were well-intended, the study points out what’s wrong with the service. There’s no guarantee that Rekognition’s clients are using the facial analysis tool as Amazon intended. And even if such data is viewed in aggregate, it would very likely still be incorrect. “It is not possible to ensure how gender classification is used by third-parties.” Scheuerman said. “While they recommend Rekognition only be used for aggregate gender distribution statistics, the results of such an analysis would never count trans people accurately, nor would they include other non-binary genders.”