Thursday, March 24, 2011

Umpire Strike Zones

Recently, I've been working on a new post for FBJ. Hopefully, that will be ready to go tomorrow, but with the publication of Jeff Zimmerman's umpire projections today, I thought I'd post some stuff here. Jeff makes some cool plots and has some cross-tabulations of umpire strike call percentage for a few years. However, it seems like something went wrong. If you're curious, go over there and also check out the comments.

Below is a quick table of umpires that were behind the plate for at least 5,000 plate appearances from 2007 through 2010 (for which Pitch F/X data is available). From the looks of things, the umpire can have just over a two-run effect on the outcome of the game due to his strike zone (ADDENDUM: MGL correctly points out in the comments that my language is imprecise, and the assumption that the noise is evened out is too strong. I agree he is correct. I should have said that the difference in the data is a bit over 2 runs, NOT that the EFFECT was a little over 2 runs. His suggestion is that the effect is about 0.6 runs. I'll see what other info I can get out of the data.). Of course, we're assuming that umpires are randomly assigned and that the quality of the pitching and hitting evens out over the 5,000 plate appearances, which is a pretty strong assumption. But even if the range of the effect was only a single run, I think this would be pretty significant. The data below is for 2007 through 2010.

Umpire First Name Umpire Last Name Games PA Strikeout % OBP SLG AVG Runs Per Game
Jerry Crawford 87 6834 16.56% 0.3459 0.4268 0.2639 10.17
Angel Campos 84 6466 18.33% 0.3361 0.4191 0.2658 9.92
Gerry Davis 140 10822 16.68% 0.3354 0.4250 0.2635 9.89
Tim Welke 127 9755 18.60% 0.3324 0.4215 0.2636 9.83
Chad Fairchild 131 10262 18.01% 0.3343 0.4181 0.2617 9.82
Jim Reynolds 130 10079 18.25% 0.3372 0.4234 0.2690 9.74
Tim McClelland 144 11090 16.47% 0.3418 0.4168 0.2660 9.72
Tim Tschida 135 10528 17.31% 0.3413 0.4167 0.2678 9.69
Larry Vanover 133 10202 17.70% 0.3312 0.4153 0.2617 9.68
Sam Holbrook 139 10618 17.48% 0.3345 0.4280 0.2628 9.68
Bill Welke 132 10248 17.94% 0.3357 0.4153 0.2690 9.64
Mike Reilly 138 10705 17.91% 0.3410 0.4241 0.2666 9.62
Randy Marsh 93 7050 15.26% 0.3435 0.4173 0.2671 9.52
Alfonso Marquez 103 8103 16.46% 0.3380 0.4093 0.2609 9.50
Scott Barry 110 8366 16.91% 0.3365 0.4206 0.2608 9.48
Tim Timmons 134 10349 17.61% 0.3314 0.4173 0.2650 9.48
Paul Schrieber 110 8678 16.73% 0.3450 0.4100 0.2610 9.46
Brian Knight 128 9760 17.01% 0.3368 0.4200 0.2646 9.46
Jerry Meals 138 10596 17.57% 0.3322 0.4190 0.2617 9.44
Adrian Johnson 120 9338 17.56% 0.3376 0.4147 0.2601 9.39
Dana DeMuth 141 10871 17.66% 0.3330 0.4060 0.2599 9.38
Brian Gorman 139 10599 17.81% 0.3312 0.4233 0.2657 9.37
CB Bucknor 138 10771 17.44% 0.3361 0.4121 0.2669 9.34
Chuck Meriwether 105 8079 17.45% 0.3296 0.4058 0.2608 9.31
Ed Hickox 105 7955 17.88% 0.3243 0.3943 0.2513 9.31
Eric Cooper 133 10174 17.75% 0.3293 0.4119 0.2643 9.31
Tony Randazzo 102 7881 17.64% 0.3283 0.4246 0.2646 9.29
Marvin Hudson 136 10702 17.81% 0.3336 0.4028 0.2592 9.29
Charlie Reliford 75 5699 17.70% 0.3226 0.3980 0.2558 9.24
Wally Bell 142 10937 18.20% 0.3274 0.4198 0.2593 9.24
Lance Barksdale 139 10545 17.52% 0.3323 0.4062 0.2552 9.24
Greg Gibson 135 10583 17.00% 0.3311 0.4046 0.2568 9.23
John Hirschbeck 81 6167 17.97% 0.3256 0.4106 0.2585 9.21
Dan Iassogna 138 10521 18.40% 0.3345 0.4112 0.2609 9.20
Todd Tichenor 85 6480 17.02% 0.3375 0.4040 0.2628 9.19
Derryl Cousins 139 10809 17.73% 0.3262 0.3952 0.2496 9.18
James Hoye 147 11464 17.81% 0.3295 0.4014 0.2572 9.15
Joe West 142 11016 17.27% 0.3281 0.4067 0.2538 9.14
Jim Joyce 131 10070 16.74% 0.3341 0.4036 0.2599 9.14
Dale Scott 142 10816 18.14% 0.3325 0.4143 0.2623 9.13
Marty Foster 121 9343 18.41% 0.3285 0.4101 0.2584 9.12
Ted Barrett 141 10802 17.79% 0.3263 0.4078 0.2568 9.11
Mike Everitt 143 11021 18.10% 0.3279 0.4114 0.2569 9.09
Kerwin Danley 109 8248 17.34% 0.3359 0.4069 0.2633 9.08
Fieldin Culbreth 142 10848 17.16% 0.3311 0.4175 0.2603 9.01
Tom Hallion 138 10428 18.37% 0.3251 0.4121 0.2561 9.01
Brian Runge 120 9048 18.39% 0.3238 0.4149 0.2590 8.99
Laz Diaz 139 10683 18.41% 0.3234 0.4069 0.2560 8.99
Bruce Dreckman 123 9573 17.05% 0.3290 0.4013 0.2579 8.98
Paul Nauert 137 10471 17.85% 0.3262 0.4146 0.2602 8.98
Gary Darling 131 9874 18.14% 0.3289 0.4100 0.2621 8.96
Mike DiMuro 109 8386 18.28% 0.3219 0.3997 0.2515 8.95
Mark Wegner 133 10173 18.34% 0.3279 0.3991 0.2518 8.94
Phil Cuzzi 138 10492 18.76% 0.3252 0.4067 0.2582 8.93
Angel Hernandez 141 10650 17.29% 0.3279 0.3962 0.2557 8.90
Ed Rapuano 140 10689 17.55% 0.3293 0.4072 0.2579 8.89
Bob Davidson 140 10803 17.40% 0.3307 0.3924 0.2576 8.86
Mike Winters 133 9904 18.35% 0.3302 0.4070 0.2620 8.86
Rob Drake 146 11091 18.86% 0.3231 0.4019 0.2515 8.85
Jim Wolf 133 10133 18.01% 0.3313 0.4078 0.2604 8.83
Hunter Wendelstedt 140 10625 17.37% 0.3258 0.4021 0.2558 8.81
Bill Miller 142 10852 18.69% 0.3186 0.4026 0.2534 8.77
Brian O'Nora 124 9305 17.69% 0.3221 0.4100 0.2571 8.77
Ron Kulpa 130 10016 18.24% 0.3286 0.4033 0.2578 8.76
Jerry Layne 118 9071 17.43% 0.3313 0.4008 0.2525 8.71
Mark Carlson 107 7971 18.15% 0.3266 0.4053 0.2565 8.67
Jeff Kellogg 143 10784 17.23% 0.3291 0.4101 0.2563 8.66
Paul Emmel 134 10107 18.77% 0.3195 0.3924 0.2537 8.65
Chris Guccione 148 11205 17.72% 0.3303 0.3999 0.2578 8.64
Jeff Nelson 123 9399 18.24% 0.3248 0.3997 0.2523 8.63
Gary Cederstrom 138 10387 18.07% 0.3292 0.4031 0.2583 8.62
Doug Eddings 140 10530 18.64% 0.3237 0.4112 0.2596 8.56
Andy Fletcher 117 8930 18.91% 0.3221 0.3852 0.2491 8.20
Mike Estabrook 83 6265 18.13% 0.3200 0.3848 0.2559 7.95
Bill Hohn 91 6618 16.88% 0.3234 0.3965 0.2505 7.91


Anyway, Jeff's post was more about strike calling percentage than anything else. His tables seem strange, and if they're telling me what I think they're telling me, then I don't think they're correctly. For example, of all pitches called strikes by the umpire in 2010, I have about 65% of those falling within the RULEBOOK strike zone (that means the edges of the plate, NOT the 2-foot wide zone commonly used for the zone).

PRELIMINARY DATA HAS BEEN REMOVED BECAUSE I'VE SEEN IT ABUSED IN CERTAIN PLACES. PLEASE SEE LATEST VERSION OF DATABASE!

Below, I show a table of a number of things. The first 3 columns show the percentage of pitches within the rulebook strike zone CORRECTLY called a strike. Similarly, the next 3 columns show the percentage that each umpire CORRECTLY calls a ball when it is truly outside the strike zone. I do this for all batters, RHB, and then LHB.


Next, I also tally up the INCORRECT ball and strike calls. So these are the percentages that each umpire calls a Strike on a pitch that is actually OUTSIDE the rulebook zone OR calls a Ball on a pitch that is truly WITHIN the rulebook zone. Again, keep in mind I use the rulebook zone, rather than the standard 2-foot wide zone:

PRELIMINARY DATA HAS BEEN REMOVED BECAUSE I'VE SEEN IT ABUSED IN CERTAIN PLACES. PLEASE SEE LATEST VERSION OF DATABASE!

I was in the process of also recording the total number of pitches called by each umpire to put it in perspective, but did not have time before posting this. I'll add that stuff later on. I think it's pretty obvious that Barrett doesn't have a perfect call percentage with LHB up to bat.

Anyway, I'll have more on this later. For now, look at the zones below from 2010 for all of the umpires in video format (yeah, yeah, I re-posted it but it sure makes sense to have it in this post as well).

NOTE: I fixed the videos. I was made aware that no one could see them because of Facebook privacy settings. Please let me know if there is still a problem. DUH!

Another Update: I added pitch counts for 2010 to the data tables above as to keep from making big conclusions with small sample sizes. When comparing RHB to LHB, remember that it's pretty common to have the LHB zone shifted outside. Because I have used the BOOK ZONE to gauge 'correctness' of the call, these will be skewed a bit. Also, I am working on getting the tables a bit more manageable for Blogger, which continues to disappoint me with its formatting capabilities.





19 comments:

  1. It looks like your Incorrect Strike % is wrong, its the correct strike %, but other than that this is very useful.

    I particularly like the incorrect strike and incorrect ball table.

    Going back to Jeff Zimmerman's post, from the graphs it looks like the high inside pitch is almost never called as a strike. I think they should try to train for that tendency.

    ReplyDelete
  2. Thanks, Kazinski. Looks like I merged the wrong column in my table. I'll fix it up.

    ReplyDelete
  3. Is all the data in all the charts from 07-10?

    MGL

    ReplyDelete
  4. "From the looks of things, the umpire can have just over a two-run effect on the outcome of the game due to his strike zone."

    That is not even close to being true. Most of what you are seeing in the runs per game column is noise.

    The actual differences in rpg between the most hitter and pitcher friendly umpires is around .6. I would guess that 1 SD of umpire "rpg" due to their strike zone is .25.

    MGL

    ReplyDelete
  5. MGL,

    The first table is 2007 through 2010 for those games which have Pitch F/X data (I'm using my database to calculate these).

    However, for the Pitch-Level data, it is only 2010. Also keep in mind they aren't projections, they're just cross-tabs for the umpires.

    And sorry for the shitty formatting. For a closer inspection, you might want to copy and past the tables into Excel.

    Like I said, the "correct" and "incorrect" designation should be taken with a grain of salt, as it's simply the book zone, which we know isn't called for the most part by the umpires, and it extends a bit off the plate on both sides.

    ReplyDelete
  6. Fair enough, and I agree there is plenty of noise. It's a strong statement, and I was mainly just referring to the data as given. I'll take a look at that stuff, too.

    ReplyDelete
  7. In my defense, I state that the random 'evening out' assumption is extremely strong.

    ReplyDelete
  8. Any chance you can post the pitch-level data for 07 and 09 as well? Thanks.

    MGL

    ReplyDelete
  9. I'll try and get the 2007 to 2009 pitch data handy today or tomorrow. And for anyone reading this, if you see anything fishy with the data let me know so I can fix any mistakes with the calculations.

    ReplyDelete
  10. Why don't the percentages add up for correct/incorrect strike percent? For example, Bob Davidson had a correct strike percent of 89.92%, but an incorrect strike percent of 17.40%. These tables are both data from 2010, correct?

    ReplyDelete
  11. They're conditional on the location of the pitch (inside or outside the zone). Sorry, it's a little confusing. So here's what they're showing:

    Correct Strike %: Percentage of balls within the zone called a strike.

    Incorrect Strike %: Percentage of balls outside the zone called a strike.

    Correct Ball %: Percentage of balls outside the zone called a ball.

    Incorrect Ball %: Percentage of balls within the zone called a ball.

    We should expect that Incorrect Ball % + Correct Strike % = 100%.

    Bob Davidson's Correct Strike % is 89.92% while his Incorrect Ball % is 10.08%. So this accounts for 100% of the called pitches within the strike zone.

    Let me try and think of a more logical way to title the columns and/or add in crosstabs of correct strike calls vs. incorrect strike calls.

    ReplyDelete
  12. Alright, thanks. I was just adding the wrong numbers together. That makes more sense.

    ReplyDelete
  13. I'm double checking the calculations on the opposite of what I explain above though: the pitches outside the zone.

    For some reason, some of the umpires aren't adding up to exactly 100% (i.e. Incorrect Strike % + Correct Ball %). Almost all the umps seem to be short about 2% of 100% when I add these together. I'll try to figure this issue out and fix it. They should be very close though.

    ReplyDelete
  14. Figured out the problem with the latter categories.

    When adding Incorrect Strike % and Correct Ball % keep in mind that this does not account for Intentional Balls or Pitch Outs.

    I think leaving things as is makes more sense anyway. Gauging umpire zones and performance on these sorts of pitches would seem to be misleading, as I assume they get them all right.

    ReplyDelete
  15. The logical conclusion to your data is to determine some sort of sensitivity and specificity for each umpire, e.g.
    Sen=CS/(CS+IB) % of times a pitch is correctly called when it is in the zone.
    Spec=CB/(CB+IS) % of time a pitch is correctly called when it is out of the zone
    An ump with a low sensitivity would call a lot of strikes balls, i.e. be batter friendly.
    An ump with a low specificity would call a lot of balls strikes, i.e. be pitcher friendly.
    An ump with high numbers would not favor either the pitcher or the batter.

    ReplyDelete
  16. EP, that is what the data is saying (though, the columns are not particularly well-labeled, which I'm working on fixing up).

    The data is already in the Sensitivity/Specificity form. In the tables, Correct Strike % is what you describe as Sensitivity, while Correct Ball % is the Specificity.

    ReplyDelete
  17. "I'll try and get the 2007 to 2009 pitch data handy today or tomorrow. And for anyone reading this, if you see anything fishy with the data let me know so I can fix any mistakes with the calculations."

    Great, looking forward to it!

    MGL

    ReplyDelete
  18. Any progress on updating the data, Millsy?

    MGL

    ReplyDelete
  19. Already posted on Saturday. In a following blog post as an Excel file (too much to stick up here).

    ReplyDelete