Algorithms are making the same mistakes assessing credit scores that humans did a century ago

Money2020, the largest finance tradeshow in the world, takes place each year in the Venetian Hotel in Las Vegas. At a recent gathering, above the din of slot machines on the casino floor downstairs, cryptocurrency startups pitched their latest coin offerings, while on the main stage, PayPal President and CEO Dan Schulman made an impassioned speech to thousands about the globe’s working poor and their need for access to banking and credit. The future, according to PayPal and many other companies, is algorithmic credit scoring, where payments and social media data coupled to machine learning will make lending decisions that another enthusiast argues are “better at picking people than people could ever be.”

By Rachel O'Dwyer8 min readUpdated July 20, 2022

Add QZ to Google

Money2020, the largest finance tradeshow in the world, takes place each year in the Venetian Hotel in Las Vegas. At a recent gathering, above the din of slot machines on the casino floor downstairs, cryptocurrency startups pitched their latest coin offerings, while on the main stage, PayPal $PYPL President and CEO Dan Schulman made an impassioned speech to thousands about the globe’s working poor and their need for access to banking and credit. The future, according to PayPal and many other companies, is algorithmic credit scoring, where payments and social media data coupled to machine learning will make lending decisions that another enthusiast argues are “better at picking people than people could ever be.”

Credit in China is now in the hands of a company called Alipay, which uses thousands of consumer data points—including what they purchase, what type of phone they use, what augmented reality games they play, and their friends on social media—to determine a credit score. In a culture where the elderly to pay for groceries and even the homeless to accept donations, there’s plenty of data to draw on. And while the credit score can dictate the terms of a loan, it also acts as a proxy for general good character. In China, having a high credit rank can help your chances of accessing employment, for example, or of to travel within Europe, and even finding a partner via online dating. One Chinese dating site, Baihe.com, offers to users with high credit scores.

The essential business news, delivered fresh every morning.

Join 500,000+ readers who start their day with Quartz.

By subscribing, you agree to our Terms of Service and Privacy Policy.

And all of it is dictated by the algorithm.

The decisions made by algorithmic credit scoring applications are not only said to be more accurate in predicting risk than traditional scoring methods; its champions argue they are also fairer because the algorithm is unswayed by the racial, gender, and socioeconomic biases that have skewed access to credit in the past. It might not be clear why playing video games, owning an Android phone, and having 400 Facebook $META friends can help to determine whether or not a loan application is successful, but a decade after the financial crisis, the logic goes, we need to trust that the numbers don’t lie.

Alipay isn’t alone. Aside from Chinese competitors like WePay, other companies are using machine learning to make lending decisions in Sub-Saharan Africa. One such company, called Branch, is capitalizing on mobile phone adoption in Kenya, drawing down data gleaned from the hugely popular mobile payments platform M-Pesa to devise credit scores. And of course, algorithmic credit scoring isn’t confined to emerging credit markets. In Germany, Kreditech, a lending service determined to build the “Amazon $AMZN for consumer finance,” is moving away from traditional metrics such as repayment histories, to mine the personality clues hidden in the Facebook data its customers surrender. Meanwhile, a U.S. company called ZestFinance uses big data to target customers whose ratings arguably never recovered from the subprime mortgage crisis.

As Schulman’s Money2020 speech suggests, algorithmic credit scoring is fueled by a desire to capitalize on the world’s ‘unbanked,’ drawing in billions of customers who, for lack of a traditional financial history, have thus far been excluded. But the rise of algorithmic credit also responds to anxieties in developed economies too—particularly in the aftermath of the financial crisis. A decade post-crash, there’s a whiff of a hope that big data might finally shore up the risky business of consumer credit everywhere. Whether we ought to have faith in that promise remains an open question—and one that is hard to answer given the impenetrability of machine learning.

In 2002, J.P. Martin, an executive at Canadian Tire, began to analyze transactional data from the previous year. The company sold sports and recreation equipment, homewares, and automotive supplies, and issued a credit card that was widely accepted. By examining transactional histories, Martin traced correlations between the purchases that customers made and the likelihood they would default on their repayments. Responsible and socially-orientated purchases such as birdseed or tools to remove snow from roofs correlated with future creditworthiness, while cheap brands of motor oil indicated a higher likelihood of default.

Shortly afterwards, some credit card companies began using these and other discoveries to scrutinize their customers. In the US, every transaction processed by Visa $V or MasterCard is coded by a “merchant category“—5122 for drugs, for example; 7277 for debt, marriage, or personal counseling; 7995 for betting and wagers; or 7273 for dating and escort services. Some companies curtailed their customers’ credit if charges appeared for counseling, because depression and marital strife were signs of potential job loss or expensive litigation.

While these calculations were based on transactional histories, credit-scoring algorithms respond to datasets with thousands of variables aggregated from payment histories, social media, demographic, and even GPS data. ZestFinance’s patent describes the use of payments data, social behavior, browsing behaviors, and details of users’ social networks as well as “any social graph informational for any or all members of the borrower’s network.” Similarly, Branch’s privacy policy mentions such factors as personal data, text message logs, social media data, financial data, and handset details including make, model, and browser type. These applications don’t just draw on this aggregated data to make a decision; they create systems that recursively analyze and refine their results against a desired output, enabling the algorithm to “learn” by making its own connections. As the CEO of ZestFinance recently argued, “all data is credit data,” and the machinations of the algorithm are no longer so straight forward as snow rake = good; marriage counseling = bad.

While companies are generally up-front about what data is input to refine and upgrade the decision-making processes, the black box of the algorithm dictates that no one person really knows what data—or what combinations of data—will prove significant. With a little trial and error, for example, Joe Deville, a researcher at Lancaster University in the U.K., discovered that simply changing the screen resolution on his phone seemed to result in a different score for some algorithmic lenders, while others have suggested that actions as mysterious as charging your phone more often may produce a more favorable result. Meanwhile, the chief executive of Branch speaks whimsically of their machine-learning algorithm as a “robot in the sky” — a kind of AI fairy that makes lending decisions based on whether its users are naughty or nice. If you’re unhappy with the number that emerges from the black box, there’s little you can do to change or dispute it.

Algorithmic credit scores might seem futuristic, but these practices do have roots in credit scoring practices of yore. Early credit agencies, for example, hired human reporters to dig into their customers’ credit histories. The reports were largely compiled from local gossip and colored by the speculations of the predominantly white, male middle class reporters. Remarks about race and class, asides about housekeeping, and speculations about sexual orientation all abounded. One credit reporter from Buffalo, New York noted that “prudence in large transactions with all Jews should be used,” while a reporter in Georgia described a liquor store he was profiling as “a low Negro shop.” Similarly, the Retailer Credit Company, founded in 1899 (now Equifax) made use of information gathered by Welcome Wagon representatives to collate files on millions of Americans for the next 60 years.

By 1935, whole neighborhoods in the US were classified according to their credit characteristics. A map from that year of Greater Atlanta comes color-coded in shades of blue (desirable), yellow (definitely declining) and red (hazardous). The legend recalls a time when an individual’s chances of receiving a mortgage were shaped by their geographic status. The neighborhoods that received a hazardous rating were frequently poor or dominated by racial and ethnic minorities. The scoring practice, known today as redlining, acted as a device to reduce mobility and to keep African American families from moving into neighborhoods dominated by whites.

The Fair Credit Reporting Act in 1970 and the 1974 Equal Credit Opportunity Act were attempts to rectify these discriminatory practices. Today, or so the fintech narrative goes, we have detailed and unbiased scoring algorithms that are perceptually blind to gender, class, and ethnicity in their search for a creditworthy individual. And yet, burgeoning studies of how algorithms classify and make decisions mirror these historic geographies of exclusion, leading academics such as Cathy O’Neill and Frank Pasquale, who study the social, economic, and political effects of algorithmic decision making, to point to emergent practices of “weblining,” where algorithmic scores reproduce the same old credit castes and inequalities. Because these systems learn from existing data sets, it often follows that existing bias shapes what the machine decides is good, bad, normal or creditworthy.

These systems are fast becoming the norm. The Chinese government is now close to launching its own algorithmic “Social Credit System” for its 1.4 billion citizens, a metric that uses online data to rate trustworthiness. As these systems become pervasive, and scores come to stand for individual worth, determining access to finance, services, and basic freedoms, the stakes of one bad decision are that much higher. This is to say nothing of the legitimacy of using such algorithmic proxies in the first place.

While it might seem obvious to call for greater transparency in these systems, with machine learning and massive datasets it’s extremely difficult to locate bias. Even if we could peer inside the black box, we probably wouldn’t find a clause in the code instructing the system to discriminate against the poor, or people of color, or even people who play too many video games. More important than understanding how these scores get calculated is giving users meaningful opportunities to dispute and contest adverse decisions that are made about them by the algorithm.

Maybe then we can really see if these systems are giving credit where credit is due.

This article was originally published on Undark. Read the original article.