The baseball season never ends for data nerds. Nearly a month after the Houston Astros won the 2017 world series, Bill James, the godfather of modern baseball statistics, dropped a bomb that is rocking the world of baseball analysts and data-heads.
James first made his name in the late 1970s when his data-driven approach to player and team evaluation proved false much of baseball’s received wisdom. In doing so, he helped launch sabermetrics, a movement that by now is so ubiquitous that nearly every pro team has a squad of analysts crunching numbers in their front office. But now, James thinks his descendants have strayed from the path. The last straw for James was the recent debate over who deserved the 2017 American League most valuable player award.
The award, determined by a vote among members of the Baseball Writers’ Association of America, was given to the Houston Astros’ second baseman Jose Altuve. But many of James’s younger peers argued that the New York Yankees’ right fielder Aaron Judge deserved the accolade just as much, if not more.
“It’s nonsense,” James wrote on his website on Nov. 17. “Aaron Judge was nowhere near as valuable as Jose Altuve…. It is NOT close. The belief that it is close is fueled by bad statistical analysis.”
Most of the baseball writers and analysts arguing that Judge was nearly or as valuable as Altuve were making their cases on the basis of “wins above replacement,” an increasingly commonplace statistic usually abbreviated as WAR. Unlike most baseball statistics, which measure the outcomes of specific skills, like batting average or earned run average, WAR attempts to take all of a player’s contributions and sum them up into one number that represents overall value to the team. In many ways, WAR comes out of the school of baseball thinking James helped advance.
A number of websites calculate slightly different versions of WAR with minor methodological difference, but they usually arrive at similar numbers. According to the popular baseball website FanGraphs, Aaron Judge was 8.2 wins in 2017, the most wins of any player, and Altuve was worth 7.5, the second most. Another popular site, Baseball-Reference had Altuve (8.3 wins) slight higher than Judge (8.1).
James’s problem with the statistic isn’t that there’s no agreed-upon arbitrator of the number. It’s that WAR is not connected to the number of wins a player’s team actually won.
In real life, the Yankees won 91 games in 2017. But in calculating Judge’s WAR, he gets credit for contributing to 102 wins. Why? Because the Yankees’ overall statistical performance suggests that if the team hadn’t suffered some bad luck, it would have won that many games. A common example of such bad luck is having an unusual number of hit balls go straight to other team’s fielders.
James thinks this is crazy. “[WAR] is dead wrong because the creators of that statistic have severed the connection between performance statistics and wins, thus undermining their analysis,” he writes. He goes on to point out that Judge performed worse than Altuve in critical situations, such as the late innings of close games, and that WAR does not properly take this into account.
This feels weird to write, because I revere Bill James, but I think he’s the one who’s dead wrong.
WAR is not a perfect metric for measuring player performance, but it’s a strong one. Maybe the strongest we have. As FanGraphs managing editor and writer Dave Cameron points, WAR does a good job isolating the aspects of a player’s game outcomes that are under his control from those that at the mercy of luck. For example, if a player hits 10 home runs with no one was on base, and a second player hits 10 home runs with a runner on base every time, the former shouldn’t be penalized for hitting home runs in a less contextually “valuable” situation. It’s not under his control if the player before him gets on base.
Ultimately, the problem with James’s argument is that he is attacking a statistic, and statistics, if calculated correctly, are not right or wrong. Rather, they are either useful or not useful for understanding a situation. GDP growth and the unemployment rate are not perfect metrics, but if their strengths and limitations are understood, they are certainly useful for understanding whether an economy is doing well. For example, unemployment rate is an excellent statistic for understanding how easily people seeking jobs find them, but says nothing about the people who have completely given up on looking.
The same is true of WAR and baseball-player analysis. It’s a great tool to measure the aspects of a player’s performance that are independent of his teammates’ performances, but James is right in at least one regard: it doesn’t say everything there is to say about whether a baseball player is good or great. Where he’s wrong is that using WAR is inherently bad statistics.
Additional reporting by Elijah Wolfson (@elijahwolfson).