Friday, April 1, 2011

Umpire Call Database

Okay, so after some talks with Mike Fast about the strike zone and some suggestions/requests for certain types of data from MGL, I've got an updated file with a complete description of umpire calls. Keep in mind the following:

1. At the suggestion of Mike, I used 1.75 feet as the bottom of the zone and 3.45 feet for the top of the zone. This is likely more inaccurate than basing the zone off the height of each batter. However, I don't currently have batter height integrated into my Pitch F/X data. So in general, umpires are probably slightly more accurate than my numbers if they truly base their zone off the height of the batter. Also note that I use the rulebook zone (width of the plate), rather than the 2-foot wide zone.

2. All tabulations are only from pitches that include FX location data with them. There are pitches here and there that didn't register with the FX system, and they are of course not included.

3. In the data file, the last 12 columns in each worksheet should be of most interest to people. Some notes on this:

a. The Green and Red indicate "Correct" and "Incorrect" calls, respectively based on the rulebook zone.

b. The Sensitivity is the umpire's percentage of Within Zone pitches that are correctly called Strikes. The Specificity is the umpire's percentage of Outside Zone pitches that are correctly called Balls.

c. There is a variable key included within the file. Read it and understand it before snooping too much into the data.

4. I make no guarantees as to the accuracy of the calculations. In general, these are rough estimates of umpire performance on Ball and Strike calls. Because of the fixed top and bottom of the zone, they should be taken with some caution. I did my best to ensure that everything is correct. There are other ways to do this, including using the 2-foot wide zone and varying the top and bottom of the zone.


Without further adieu, here is the data.

With that said, enjoy! If you use this data anywhere, I always appreciate a cite or a link back here. If you are using the data for your own personal use, I'd love to hear what you will be using it for. If you have any questions, leave them in the comments or feel free to shoot me an email. But be sure to read the variable key and everything first.

As always, definitely let me know if something in the data looks funky. I tried to make the column names as logical as possible, but I'm sure others will disagree. The key has explicit descriptions of everything.

3 comments:

  1. Brian,

    "1. At the suggestion of Mike, I used 1.75 feet as the bottom of the zone and 3.45 feet for the top of the zone."

    Is that the "rule book zone" or the "de facto" zone (slightly below the knees to slightly above the waist)?

    "a. The Green and Red indicate "Correct" and "Incorrect" calls, respectively based on the rulebook zone."

    So when you say that the green and red are based on the "rule book zone" you mean the "1.75 and 3.45" zone that you mention above?

    Thanks!

    MGL

    ReplyDelete
  2. Yeah, sorry for the imprecision.

    I use the Rulebook Width of the zone.

    The height (top and bottom) are a fixed de-facto zone.

    ReplyDelete
  3. Thanks for making this available. Great work.

    ReplyDelete