US regulators are investigating whether Apple’s credit card, launched in August, is biased against women. Software engineer David Heinemeier Hansson reported on social media that Apple had offered him a spending limit 20 times higher than his wife, Jamie Heinemeier Hansson. When Jamie spoke to customer service at Goldman Sachs, the bank behind the Apple card, she was told her credit limit was determined by an algorithm, and bank reps couldn’t explain why it came to the conclusion it did.
A spokesman for Goldman told Bloomberg, “Our credit decisions are based on a customer’s creditworthiness and not on factors like gender, race, age, sexual orientation or any other basis prohibited by law.” Apple and Goldman claim to use applicants’ credit score, information in their credit report, and income to establish credit limits.
There is no evidence yet that the algorithm is sexist, beyond these anecdotes. But a lack of transparency has been a recurring theme. Goldman didn’t respond to questions from Quartz about the exact mechanisms it used to determine Jamie Heinemeier Hansson’s credit limit. Further information about which quantitative measures it used in this process—high-powered machine learning? Eighth-grade algebra?—could offer clues about what, if anything, went wrong here.
For example, in 2018 when Goldman wanted to show off its quantitative prowess by forecasting the winner of the soccer World Cup, its researchers turned to machine learning. They could have used basic statistics, but that would not have been as precise. Goldman’s quants said a prediction method that harnessed machine learning methods (such as random forest, Bayesian ridge regression, and a gradient boosted machine) was five times more accurate than using a simpler statistical regression.
The problem with using a machine learning method is that it makes it hard to explain how a prediction works. Machine learning tools are, for the most part, black boxes: For what they promise in accuracy, data scientists using them lose the ability to understand how much each factor matters to the ultimate outcome of a prediction (in statistics, this is called “inference”).
For the World Cup, Goldman’s researchers knew that the variables of team strength, individual player strength, and recent performance were important predictors, but quantifying precisely how much each matters to the outcome of a match was impossible. While a regression-based model would have been a blunter tool, it would have allowed the researchers to clearly state how much of an effect each variable had on their prediction. Basically, it would have been better on transparency, but worse on forecasting.
And in the end, Goldman’s fancy algorithm did a pretty poor job of predicting the World Cup anyway. A model that was at least easier to explain may have been more useful.
In the case of the Apple Card, we don’t know for sure whether Goldman used machine learning to inform its system for calculating credit limits, but it seems likely it did, and by doing so may have put primacy on precision above all else. As mathematician Cathy O’Neal recently told Slate, when companies choose to use algorithms, “[t]hey look at the upside—which is faster, scalable, quick decision-making—and they ignore the downside, which is that they’re taking on a lot of risk.”
Data science, as a field, tends to focus on making predictions. This narrow goal may lead companies further away from thinking about bias or how well they can explain decision-making methodologies to regulators and the public at large. It can also lead to less consideration of the shortcomings of data fed into algorithmic models—some research suggests credit scoring is discriminatory, and any model incorporating that data will reflect that bias. But in many cases in modern data science, if the model makes a forecast “better” in statistical terms, its other effects may be overlooked.