In football, soccer and basketball, player quality has one obvious and inescapable metric: scoring touch-downs, goals, points. In baseball, it’s more complicated—it isn’t easy to directly associate a single player’s play with team wins. This is why “wins above replacement player” (WARP) has become so popular: It’s a statistic that manages to aggregate a player’s total contribution to their team—offense and defense and primary position—into one easily understood figure.
The key to WARP’s popularity is it distills all these factors into a universal estimate of value expressed in the number of wins a player is worth to his team if they were injured and had to be replaced by a freely available minor league player (in fact, a composite baseline minor leaguer). Created by the sabermetric baseball community, WARP plays a decisive role when it comes to voting on annual awards, such the MVP, Cy Young or Silver Slugger.
But does it actually measure—or even roughly estimate—player value?
Initially, Austin Brown—a Ph.D. student at the University of Northern Colorado in the department of applied statistics and research methods—was interested in figuring out the relationship between injuries, teams catching the injury bug throughout the course of the regular season and team performance. But as he worked on the topic with his adviser—Jay Schaffer, professor of applied statistics and research methods—they began to look at the relationship between individual players and team performance. “Our thinking was that a player of lesser quality is unlikely to impact overall team performance in the same way as a player of higher playing quality,” he says. “And that led us to WARP.”
“But as we looked at the 2016 MLB season, we found that the WARP statistic wasn’t measuring what we thought it ought to be measuring. We kind of took a step back and thought, ‘Okay, this is kind of weird—this is definitely not what we suspected.'”
They suspected the 2016 season might be an aberration. “There were a couple of really good teams that happened to have a lot of injuries,” says Brown. “So, we thought that might have been throwing it off.” But as they extended their analysis to cover the 2014–2017 seasons, WARP wilted. “No matter how we looked at it, even when we accounted for differing independent variable combinations, it’s just not related to wins,” says Brown.
“When MLB players got hurt and the real replacement players were required to come in and play for some number of games, team performance across the board was not really impacted,” says Brown. “If a player was on a poor team, the team tended to still play poorly. If a player was on a great team, the team still tended to play great.”
When Los Angeles Dodgers’ all-star pitcher Clayton Kershaw missed a sizable part of the 2016 season between July and September due to injury, the Dodgers actually played slightly better (in terms of winning percentage) than before he was hurt. That didn’t mean he was a worse player than the player replacing him. It meant the quality of the team was more indicative of team performance.
“I think the problem with these overall measures of player quality in baseball is that baseball is not like soccer, it’s not like hockey, it’s not like basketball,” Brown explains. “Look at the NBA finals last season. LeBron James pretty much got his team to the NBA finals by himself because the rest of the players are nowhere near his level of quality. But in baseball, it’s just not the same. There are great players, of course, but it’s really hard for them to carry a team just on their own because of the way the game is played.”
Collecting and organizing the injury data was not as easy as one might think, given the passion for baseball statistics and the impact it has had on the game. Brown and Schaffer had to draw from the MLB, Baseball Prospectus, Baseball Reference, and Sports Illustrated. Which meant a lot of time cleaning and merging data. And perhaps that’s why, in part, WARP took off—getting the right data to check it out was a huge challenge.
The next step, says Brown, is to drill further down into the data and figure out a statistic that is more indicative of player quality—one that derives from player-specific factors. “This will probably not be able to scale in the way that WARP does,” says Brown, “which is what makes WARP so popular. But it will be more useful. Unlike WARP, it will be actually measuring something at the player level more directly related to team performance.”
JSM Talk: ww2.amstat.org/meetings/jsm/20 … fm?abstractid=328389