I've been working on a new R program that grabs batter-pitcher-umpire level data and creates heat maps for given parameters. My ultimate goal is to create my own function and tool to grab any heat map I'm interested in with a single line of code (sourcing the script, of course). This can be done pretty easily, and below I've presented my first attempt at using my first attempt at the function in a movie format.
For the heat maps presented here, I used the 'mgcv' package in R, which runs a binomial GAM model using cross-validation for the smoothing parameter. This is an important inclusion in writing a program to automate the creation of heat maps, as the variability, range of values, and sample size for pitches is different depending on the player or umpire being modeled. Using cross-validation, we can be sure to use some sort of optimal smoothing parameter given the data at hand for each individual umpire. This version of the GAM model actually uses smoothing splines, rather than a loess function, to smooth. The ultimate result is pretty much the same though.
Anyway, check out the videos below. I'm working on working with swing rates, run values, swinging strike rates, home run rates, ball-in-play rates, etc. for players as my next step. These are a little trickier given the smaller sample sizes for players and hence will likely need to use a standard Gaussian loess function even for binomial data, as there are some serious problems with a GAM model and small samples. I've done this already by umpire, by count. I'm not happy with the result of the loess for binomial strike zone calls, as the smoothing stretches way too far and the sample sizes are very small even for this method. They give the general idea of the relative strike zone changes by count (as J-Doug has been writing about at Beyond the Boxscore), but the visual is just misleading with respect to the actual strike zone size.
I've got a few ideas for this stuff, which I may advertise a bit later because I'll need some help to implement any of them. For now, enjoy the little slide shows below. Sorry I didn't provide each PNG file for your own inspection, but there are 78 umpires included in the data set (I removed some with extremely small sample sizes from 2010). Of course, I'm always happy to contribute some visuals to your website if you are interested in these.
In the videos below, the order of the umpires should be the same. Therefore, if you quickly click each one right after the other, they should start at about the same time and you can view RHB and LHB zones for the same umpire at the same time as it scrolls through.
I apologize for the crappy resolution in the videos. Apparently when it was converted it really messed with the quality of the images.
ANOTHER UPDATE: Thanks to the ability to embed a video from Facebook, I was able to improve on the resolution. Hooray for Facebook!