Friday, April 8, 2011

Umpire Bias Favoring Catchers At Bat?

Recently, I've been working a lot with umpire data. A lot of this has to do with the nice big sample sizes that it provides for most umpires, which just makes it easier to infer interesting things from the data. Today I thought I would check out something I came across when looking for B-Pro topics over the past couple weeks.

It is relatively well-known that umpires and catchers do their best at keeping a good rapport with one another behind the plate. Catchers need to be diplomatic when asking about a call on a given pitch, as umpires may not take well to being called out for an incorrect call. The implications of these interactions very well may be important for the catchers’ battery mate standing 60 feet 6 inches away, and the hope is that—if the catcher has any effect on borderline calls—it will be a positive one for his pitcher. This is a difficult thing to measure, so I’ll have to leave this for someone else who has access to players and can survey them.

No, today I’ll be asking a related question, though from a different angle: Do umpires give catchers the benefit of the doubt when they’re at the plate? If catchers are being cordial with the umpire behind the plate, then this could be a result of both team-level and individual-level incentives. If the catcher knows that being nice will improve his experience at bat, then he may have a strong incentive to be nice.

But there’s a cognitive aspect to this from the umpire perspective as well. If an umpire screws up a call for a catcher, then he has to face the guy in a couple of outs right there behind the plate. The rest of the team heads off at least 60 feet away, where the umpire won’t be close enough to hold their hand. Think about how you might talk to someone you can’t stand on the internet. Then think about whether you would say those things to their face. Are the two interactions different? Could this cognitive aspect come into play when making ball-strike calls when catchers are up to bat?

To answer this question, I’ll use Pitch F/X data from 2008 through 2010. I include a dummy variable for whether or not the player is a regular catcher, as well as variables giving us information about the distance that the pitch is from the center of the strike zone, whether or not the umpire made the ‘correct call’ (based on a strike zone covering the width of the plate and within a height of 1.75 ft. and 3.45 ft.), and a few other controls. Using a few different types of analysis, I’ll do my best to tackle the data.

My first step was to run a simple logistic regression on the calls made by the umpire against all batters during this time period. In other words, I’ll be predicting the effect of my variables on the probability that a certain pitch is called a strike, holding constant the location, count, batter/pitcher handedness, inning, and so on. The dependent variable here is whether or not the pitch was called a strike (1=called strike, 0=not called strike), and the data include only calls made by the umpire.

One thing to consider, however, is that catchers could be coming in at a bit of a shorter height than the rest of the batters in the sample. Because I used a fixed top and bottom of the strike zone, those pitches at the edges of the top and bottom could be biasing the catcher effect. The average MLB roster is about 6’ 1” or 6’ 2” (, while the average catcher height came in at just under 6’ 1” last season ( If we assume the difference is about 1 inch between catchers and the rest of the league, we should probably account for this. Within the data set, for a given pitch, the difference in the strike zone of catchers vs. non-catchers is roughly less than half a vertical inch. For the purposes of this preliminary look, I proxy batter height using the listed top and bottom of the zone within the Pitch F/X data. While the provided numbers are extremely noisy and problematic for choosing whether or not a pitch was “within the zone”, they should work well as a rough proxy for the height of the batter on average. This likely won't control enough for height, but running the regression with and without the top and bottom of the zone variables does not really change anything with the catcher variable at all. This could mean one of two things: 1) Height is not an issue or 2) The sz_top and sz_bot variables aren't just noisy, but completely worthless (a very real possibility).

One other thing to check is whether or not catchers just see more pitches within the strike zone defined earlier in this article. It turns out that there is not a statistically significant difference in the number of pitches seen within the fixed zone for catcher as for other players. This also gives us some slight evidence that the height of catchers isn’t too much of a problem in the model. If catchers are significantly shorter than the rest of the population, we would expect that pitchers would adjust themselves to throw pitches within this smaller zone. However, the spray of pitch locations is pretty much the same for catchers and non-catchers. For brevity, I do not include the results of this regression (though, they can be had upon request).

Below is the output from the logistic regression on the probability of a strike call:



Std. Error




































































































pitcher_throws=R & batter_stand=R





The effect in the regression for the ‘catcher’ dummy variable is statistically significant and larger than I would have expected (some of this could be coming from differences in height, despite my attempts at controlling this variable). On average, a pitch that is at the edge of the zone (normally a 50-50 change of being called a strike) is about 12.5% less likely to be called a strike if the batter is a catcher. For those unfamiliar with logistic regression, I won’t go into explaining how this changes as the probability of a strike call otherwise increases or decreases. The estimated effects of coefficients in logistic regression can't just be read off the regression table when pitches are closer to a 1 or 0 probability. So with catchers, it's likely that a pitch down the middle is still a strike very near the same rate, while a pitch 10 feet high is still a ball at very near the same rate as other batters.

This is a pretty interesting contrast, but there could be a one other thing confounding the result: I have not controlled for the talent of the batter. We know that catchers are generally not as adept at hitting the ball as those at other positions, if for no other reason than the top hitting catchers are often moved to another position early on. If umpires are ‘compassionate’ toward players who just aren’t very good hitters, then we could be picking up this effect here. I don’t currently have an answer to this issue, as I do not have individual player performance in my Pitch F/X data at this point. If it is the case that this is an effect of the umpire interacting with the batter’s skill, then it is also an interesting issue to be looked into later on that likely needs to be controlled for in the data. I am in the process of greatly improving the information in my Pitch F/X database, so hopefully I can take a look at this stuff as well.

I took the first model a bit further and followed a technique that J-Doug has used in his fantastic ‘Compassionate Umpire’ articles. For this second model, I used an indicator variable of whether the strike was in the batter’s favor (ball within the zone, called a ball), the pitcher’s favor (call outside the zone, called a strike), or neutral (“correct call”). This is the dependent variable in an ordered logistic model. As the indicator increases (from -1 to 0 to 1), the calls are coming more “in favor” of the batter. This sheds some further light on any increase in probability that the ball will be in the batters’ favor, given that he is a catcher. I again don’t present the full regression output as they simply confirm the earlier finding; however, this model also indicated a significant increase in favorable calls for catchers.

So, where is this difference in strike-calling coming from? Well, looking at the contour for the 50% call rate for left handed and right handed batters, we can see below. In the panel on the left, I plotted the 50% contours for RHB that are catchers and non-catchers, while on the right panel, we have left-handed batters (plots are from the umpire’s view). In both panes, you can see that umpires are a bit more lenient with inside pitches for both right and left-handed batting catchers. Right-handed catchers seem to get calls in their favor up in the zone, but this very well could be a result of these catchers being a little shorter than their non-catching counterparts. You can see that for left-handed catchers, the zone is shifted upward a bit. So the ‘height’ factor seems to be relatively ambiguous compared to the inside corner difference, especially considering that the lower limit of the zone for RHB is almost identical for both groups of batters. A word of caution: comparing differences this small on plots like this is not a replacement for more rigorous analysis, but they are interesting to look at once we understand some of what is going on in the data.

This is of course not certain evidence that there is something going on with the catchers at bat, but it seems to point to something interesting. I’d like to look into this phenomenon (if that is what it is) and be a bit more confident about the height of the batters and the possibility of umpires being ‘compassionate’ toward less skilled hitters, rather than catchers themselves. If anyone has batter height available by player_id (the ones included in the Pitch F/X database format from Mike Fast's tutorial), I'd love to be able to include this in my data. That way, I could try and provide a bit more accurate umpire performance estimations as well.

In the end, it very well could be that the closeness of the catcher and the umpire has an effect on the umpire taking the bat out of a catcher’s hands. But replication and improvement is always key in this sort of analysis, and I think it is needed here. I’d love to hear some reactions to the analysis, and am always willing to hear shortcomings of the approach here. I have a hard time coming out and proclaiming a definite bias without better controlling for the height of the batters, but the effect seems to be large enough that at least some of it is real.


  1. Great stuff Millsy. Any chance you can post your r code for the contour plot? Did you do a loess on the data and the fit a contour at 50 th percentile probability?

  2. The contour uses a generalized additive model with cross-validated span/bandwidth/knots (whichever way you want to define the smoothing parameter) for the smoothing splines. The only variables included are the locational variables for the plots, while the two groups of players are shown separately (of course).

    For this, I used the 'mgcv' package, and the command 'gam' from that package (the 'gam' package doesn't work right with the new R). For the contour, I simply use the same code as in my posts that use 'filled.contour' but with the non-filled 'contour' function; however, when you want to plot a single contour, you indicate this using "levels=0.5" (for the 50% contour). If you want to plot .25, .5, .75 and .9, you'd use "levels=c(0.25, 0.5, 0.75, 0.90)". And you can label these accordingly (the help filed is pretty straight forward for this function).

    Hopefully I'll have a sab-R-metrics on these functions in the future.

  3. Thanks And cool article at bpro. Will you be writing there more?

  4. I had fun doing the thing for B-Pro. If they asked for further work, I'd love to do it.

    At this point it's a matter of having time to do something on a deadline. What I like about my blog is I can write something up when I have the time. No problem if I don't stick up a post.

    Doing it for someone else often results in a deadline, and it's tough to commit to something like that while in school. Some guys are able to do it and put out quality work, and I have a lot of respect for the fact that they can do that.

  5. Great stuff, Brian. Would love to see more on this. Some quick thoughts:

    I'd be curious to see what the results are for other positions. It might be mistaken to attribute the effect to catchers. It could be related to batter ability as you mention, or positions on the left side of the defensive spectrum get the benefit of the doubt. Another possibility is there's probably more offensive variability at catcher than any other position (I'm guessing on this), which might make the cumulative effect look larger for catchers.

    It's probably a really small sample size, but it might be interesting to look at whether the result holds when you look at catchers playing another position that day (1B or DH most likely).

  6. Thanks for the comments, Dan.

    I'm also curious about the positional issue, and if it's that or batter ability. I'm still skeptical of the effect shown in the regression in this post, but there at the very least seems to be something different about catchers. It could be a few things, including height and batter ability. I'll be updating my Pitch F/X data all summer to include a lot of new things to get at these sorts of issues.

    Interesting thoughts about the variability in catchers. I'll have to think of a way to tackle the issue and how/if it could affect the estimation.

    As for your last comment, I had thought of that as another control, but since I identified catchers manually in my data, it just wasn't an option. I'm hoping to get the primary position AND position played for each game for each player added to my data as well. I think it would be a lot more interesting this way, too.

    Certainly something I'm going to try and dig a little deeper with. thanks again for stopping by.

  7. I can help you with your batter height issue. I do not have player height by Fast's IDs, but I have tables containing the MLBAM IDs, Lahman IDs and Retrosheet IDs. From there you can get player heights from the BDB master table. I can include that too if you don't already have that in your database. I can email them to you in Excel tables if you want.

  8. Hey Michael,

    I actually got them from MGL, but I'd love to cross check them with yours. Two versions of data are always useful. Thanks!