Since the release of Moneyball, the Michael Lewis book (and the subsequent film) exploring the successful data-driven approach of the Oakland A’s baseball club, the use of statistical analysis in professional sports seems to have exploded. But in the world’s most popular sport, soccer, useful statistical analysis remains sorely lacking.
That’s the view of the high-profile economist and New York University professor Daniel Altman, who also runs a sports analysis firm, North Yard Analytics. We recently caught up with Altman to speak about the flawed use of data in the “beautiful game.” He’s not the first person to question the use of stats in soccer–a bestselling book, Soccernomics, was based around the concept (though Altman has been highly critical of it).
Altman questions the use of well-known metrics such as the “total shot ratio,” and even the number of goals scored by a player in a season, to measure performance. Instead, he offers up as a more useful predictor the Shapley value, an economic concept used in game theory to determine each team member’s contribution to success—which could capture, for example, the value that a trash-talking player brings to the field, as well as the value of a high-scorer.
This interview, conducted over phone and email, was condensed and edited for clarity.
Quartz: How widespread is the use of data in modern professional soccer? How advanced is the use of statistics in soccer, relative to other sports like baseball ?
Daniel Altman: Most of the top teams use statistics in some way, whether it’s for recruiting players, maximizing their fitness, or choosing tactics. But there are no sophisticated and widely accepted metrics for gauging the ability of players, as there are in baseball. Part of the reason is that creating these metrics is much more difficult. In baseball, you have hundreds of repeated identical situations: pitches to a batter. Every batter faces a similar set of pitches, and every pitcher faces a similar set of batters. Moreover, no one else on the field has a significant effect on the outcomes of those situations. In soccer, by contrast, every player on the field affects every other player’s actions at almost every moment of a match.
Soccer lends itself to more advanced methods for assessing players and teams, sometimes following in the footsteps of hockey and basketball. I’d say the vast majority of these metrics are still in the experimental stages, and there are people from many different disciplines working on them. The problem is that few teams have staff in place who can judge whether a new metric is truly rigorous and consistent or just some snake oil that happens to fit the data for one or two seasons. I think teams should always ask about the theory behind a model and whether the model needs to be re-calibrated every season. Black boxes don’t tend to work for very long.
Quartz: Are there metrics that are being used widely in soccer that are inappropriate or even irrelevant?
Altman: The metrics seen by the public—on television broadcasts and fantasy websites, for example—are often poor measures of players’ ability. For example, the number of goals a player scores in a season says far less than the number of non-penalty goals scored or assisted per minute played. Playing time varies, penalties are somewhat random, and attacking players work together in such a way that making an assist is often a better choice than shooting at the goal.
In the English Premier League, the top players in total offensive output per minute of open play are not necessarily the same as the top scorers. Luis Suárez has been the toast of Liverpool for his goal haul this year, but Manchester City’s Sergio Agüero is slightly more likely to score or create a goal at any given time. And though Agüero’s teammate Yaya Touré is third in the league in goals with 18, six of those were penalties; he ranks 27th in non-penalty goals and assists per 90 minutes.
There are deeper issues in the soccer analytics community, too. Many metrics at the team level can’t be broken down into the contributions of individual players; “Total Shot Ratio” or TSR (a team’s shots divided by the sum of its shots and its opponents’ shots) is one example. Conversely, some metrics for players aren’t predictive of match results when they’re summed up at the team level.
Other metrics give players bad incentives if the players are aware of them. For instance, a metric that valued players more highly when they pushed the ball up the field could cause them to run forward and attempt long passes all the time, even if the team suffered. A related problem among a few of the most advanced metrics is that they favor players on winning teams even if the players have mediocre ability, so there’s less incentive to keep achieving at a high level.
Quartz: Is there a metric teams should be using but aren’t?
Altman: I’m a fan of Shapley values. There’s a misconception that these are equivalent to plus-minus ratings, like the ones in hockey and basketball. They have some similarities, but Shapley values are not the same. Lloyd Shapley, the Nobel-winning mathematician and economist, created the formula as a way of dividing up wages among members of a production team—say, in a factory—that would be accepted as fair by all of them. The formula uses a series of hypotheticals to determine how pivotal each member is to the team’s overall output. It’s a natural for sports, though deciding what goes into the formula is not entirely straightforward.
The great thing about Shapley values is that they’re completely agnostic about how a player contributes to the team’s results. Marco Materazzi won the 2006 World Cup for Italy by insulting Zinedine Zidane’s sister, which resulted in the French star headbutting the Italian defender and getting expelled from the final. If Materazzi always helped his teams by getting under their opponents’ skins, what other metric would pick this up?
Shapley values are also a useful research tool. If you calculate them for your team and a player you thought was average turns out to be really important, then you have to ask why—in other words, what are the standard metrics missing? Shapley values make you examine your team more closely. That’s what a lot of sports analysis is about: using data to learn things about the game that aren’t obvious from watching, and may even be counterintuitive.