Thursday, May 9, 2013

Revisiting Umpire Discrimination: New Paper at JSE

Two colleagues (Scott Tainsky and Jason Winfree) and I have a new paper just posted online at the Journal of Sports Economics.  We revisit the findings of Parsons et al. from 2011 (though, the working version of their paper caught press much earlier than this).  The paper was rather controversial and claimed important influences of umpires on game outcomes based on race.

Our paper uses a different data set and looks to replicate the findings from the original AER paper.  We were able to replicate the original findings from their provided data and code, but find odd uses of fixed effects are at the root of some of the findings.  A large majority of the paper looks at the robustness of the results, and implements Pitch F/X data to empirically derive the edge of the strike zone.  At best, the results initially presented in AER are mixed based on our analysis and re-analysis.

One thing to note is that the main interest of the Parsons et al. paper was not baseball.  The point was that detecting discrimination could be influenced by others that impact the performance of those of a given race (i.e. umpires in this context).  This point is still well taken, and makes up the most important contribution.  In fact, this is why the paper was published in the prestigious journal American Economic Review.

The link directly to the paper and abstract are below.  Unfortunately it is gated.  However, I am going to double-check my rights for including a link on my personal page (usually OK, but journals can sometimes be a pain on this issue).  If you have access, feel free to send along questions or comments to my email address or leave them in the comments.  Please make these comments and/or criticism constructive.


  2. "Indeed, from 2004-2008, there were only around
    2,550 pitches thrown by Black pitchers requiring the judgment of Black umpires."

    So whatever difference you find in this tiny sample must be due to racial bias, right? The entire premise of this type of study is flawed. The data is simply not there to draw generalizations about the behavior of groups of humans when "groups" contain 7-8 individuals. I.e., even if you had found a difference, there would have been no valid conclusion to draw other than the fact that those 7-8 people seemed to have some kind of bias. Sorry to vent, but this really frustrates me that people (whether the orig. authors or your group) fail to answer the basic question, "Do I have sufficient data to perform this study?" before performing it.

  3. Anon,

    This is precisely the point we are making in the paper: previous work made these inferences, and we argue that for such strong accusations, one needs a wealth of data and further inspection. We speak to the robustness of both our own results and that of Parsons et al.

    Please do read the entire paper and accompanying appendices (are the appendices available to you? If not let me know, as these are an important portion of the paper).