The study uses a large database of pitches in all baseball games from 2004-2008 to compare umpire judgment function of the “race” (white, black, Hispanic, Asian) of the umpire and the pitcher. The study finds a small but real bias against pitchers when the umpire and picture are of different races. Interestingly, the study also finds that in such circumstances pitchers adopt strategies that allow umpires less discretion, such as by avoiding throwing pitches that “paint the edges” of home plate. The authors argue that such strategies have economic consequences and also make bias harder to detect.
The fact that there is a bias detected as it has something to do with race is interesting but not the most significant conclusion of the study. I would guess that if the authors had looked for other sources of bias – such as whether tall umpires favor tall pitchers, or mustachioed umps favor pitchers with facial hair – they would have found it. Bias is a part of human judgment, and if anything, the very small degree of racial bias is itself noteworthy even if troubling by its nature.
The most important finding of the study is that when umpire performance is evaluated against an external standard -- in this case the now-defunct QuesTec umpire evaluation system used by MLB through 2008 -- the bias goes away. The bias also goes away when the game is played before a large crowd or when the pitch matters in the sense that it could be the last of an at-bat. Older, more senior umpires also express little bias, perhaps the authors suggest reflective of a winnowing process in umpiring ranks. Evaluation and accountability are thus shown to be key factors in eliminating bias in subjective judgment.
Here is a key excerpt from the paper:
Our first observation is that pitchers who match the race/ethnicity of the homeplate umpire appear to receive slightly favorable treatment, as indicated by a higher probability that a pitch is called a strike, compared to players who do not match. Although this confers an advantage to some players at the expense of others, the effect we document here is small, on average affecting less than a pitch per game. Much more interesting are situations when and where the effects are strongest. Roughly one-third of the ballparks we study contained a system of computerized cameras (QuesTec) used to evaluate the umpires, comparing their ball/strike calls to a less subjective standard. Umpires have strong incentives to suppress any bias in such situations, as the QuesTec evaluations are important for their own career outcomes. With such explicit monitoring, evidence of any race or ethnicity preference vanishes entirely.A study such as this is only possible because of the magnitude of quality data that is available to the researchers:
We find similar effects with implicit monitoring; when a game is well attended (and presumably more closely scrutinized), or when the pitch is pivotal for an at-bat, race/ethnicity matching again plays no role in the umpire’s evaluation. In situations where the umpire is neither explicitly nor implicitly monitored, the effect of the bias is considerable. As an example, a Hispanic pitcher facing a Hispanic umpire in a low-scrutiny setting (e.g., no cameras, poorly attended) receives strikes on 32.5 percent of called pitches, which drops to 30.0 percent if a black umpire is behind the plate.
The authors suggest that any problem of bias that they document in the recent historical data may now be solved:
[T]hese findings imply that the particular impacts of racial/ethnic match preferences in baseball may now have been vitiated, since beginning in 2009 all ballparks are equipped with QuesTec or similar technologies.QuesTec is no longer used. Instead MLB relies on Zone Evaluation, a technology which was developed by Major League Baseball Advanced Media and Sportvision.
The significance of this paper however goes well beyond balls and strikes. The paper suggests that if you want to address systematic bias in subjective decision making, then pay close attention to decisions, use evaluations against an external metric to assess the decision maker performance with respect to performance objectives and ensure that decision makers are aware of the evaluation criteria.