Thursday, September 23, 2010

IIATMS Guest Contribution

After my recent posts fiddling around with heat maps for pitch location, Jason at It's About the Money, Stupid contacted me to ask if I would contribute some location maps for Yankee pitchers. Obviously, I couldn't pass up the chance to contribute to an ESPN-affiliated blog.

Despite being an Orioles fan, I actually do link IIATMS on my sidebar. Why? Because the writers there like to talk about issues in baseball that aren't always 100% Yankees. For example, there has been an ongoing discussion regarding the BatGlove and using maple bats in MLB. There are always lots of posts, and I'd recommend checking them out if you haven't heard of them yet (though, if you're reading MY blog, then you have likely dug pretty far into the depths of baseball blogs and come across them before).

Anyway, the heat maps look at A.J. Burnett in certain counts. His fastball speed is down this year, and there are a few other changes. For good measure--and with inspiration from Albert Lyu at Fangraphs--I compare his 2-strike counts to those of Mariano Rivera. Quite a contrast.

Check it out!

Monday, September 20, 2010

Some Links for Now

Been quite busy of late now that I'm teaching and working on so many different projects (too many). I might try and get some more posts this fall, hopefully using the Pitch F/X plots that I've been working on (previewed in my last post). I've gotten some solid feedback on those, but would like to be able to generalize the smoothScatter function to use a loess, rather than JUST density (though, I can think of ways to restructure the data to simply use the kernel function instead). Anyway, check these out:

1. Jason at IIATMS has a good post about using maple bats in MLB. If you hadn't heard, Tyler Colvin is in the hospital for a while, as doctors are using preventative measures to keep his lung from collapsing after a broken bat punctured his chest. From what I know, these Bat Gloves don't reduce the performance of the bats, so they really should be used. Of course, this is a freak accident, but if there is no trade-off in performance, then I don't see the problem.

2. The Freakonomics blog has a great post about innovation. It uses the context of designing offensive football plays. But the final paragraph is my favorite, making sure everyone understands that patent and copyright law should differ depending on the industry. Often times, innovations are treated the same, which isn't necessarily best. It raises a great debate in Economics...one that interests me particularly outside of sport.

Thursday, September 2, 2010

Update

I tried using my own little palette with more traditional looking heatmap colors (red and pink are the densest, blue and green are less so, yellow in the middle, etc.). I also included the actual points, but would recommend it for more than a single game or two worth of pitches. I provide the code to create this same plot (just remember that it depends on how your data is structured in terms of what the points are actually plotting). One more thing to keep in mind: if you have more points, each single pitch will have less of an impact on the color in the surrounding area. For example, you won't have those big white blotches I have below for single pitches if you're plotting an entire season...so if you don't like those, then no worries.

Just curious which one others think looks better. I really like the colors on the one on the right, but the one on the left is also interesting. The right plot seems to have some blurriness to it that goes along with the color palette, while the other one is less so. My opinion is that the colors don't blend as well in the plot on the left. Part of the fun with this is finding the right balance of colors.





library(graphics)
par(mfrow=c(1,2))
smoothScatter(data$pz~data$px, nbin=1000, colramp = colorRampPalette(c("darkblue", "blue", "lightblue", "green", "yellow", "orange", "darkorange", "red", "pink")), nrpoints=Inf, pch=19, cex=.7, transformation = function(x) x^.75, col="black", main="Johnny Cueto Location (May 11, 2010)", xlab="Horizontal Location", ylab="Vertical Location")
lines(c(0.708335, 0.708335), c(mean(data$sz_bot), mean(data$sz_top)), col="white", lty="dashed", lwd=2)
lines(c(-0.708335, -0.708335), c(mean(data$sz_bot), mean(data$sz_top)), col="white", lty="dashed", lwd=2)
lines(c(-0.708335, 0.708335), c(mean(data$sz_bot), mean(data$sz_bot)), col="white", lty="dashed", lwd=2)
lines(c(-0.708335, 0.708335), c(mean(data$sz_top), mean(data$sz_top)), col="white", lty="dashed", lwd=2)

library(RColorBrewer)
buylrd <- c("#313695", "#4575B4", "#74ADD1", "#ABD9E9", "#E0F3F8", "#FFFFBF", "#FEE090", "#FDAE61", "#F46D43", "#D73027", "#A50026") smoothScatter(data$pz~data$px, nbin=1000, colramp = colorRampPalette(c(buylrd)), nrpoints=Inf, pch="", cex=.7, transformation = function(x) x^.6, col="black", main="Johnny Cueto Location (May 11, 2010)", xlab="Horizontal Location", ylab="Vertical Location") lines(c(0.708335, 0.708335), c(mean(data$sz_bot), mean(data$sz_top)), col="white", lty="dashed", lwd=2) lines(c(-0.708335, -0.708335), c(mean(data$sz_bot), mean(data$sz_top)), col="white", lty="dashed", lwd=2) lines(c(-0.708335, 0.708335), c(mean(data$sz_bot), mean(data$sz_bot)), col="white", lty="dashed", lwd=2) lines(c(-0.708335, 0.708335), c(mean(data$sz_top), mean(data$sz_top)), col="white", lty="dashed", lwd=2) And, for good measure, the same plots using a single color shaded differently based on density (the default for the smoothScatter function). Just for fun. I don't like these as much, and they look more like blurry scatter plots than anything else.

Wednesday, September 1, 2010

Heat Map and Pitch F/X

Nothing breakthrough here, not really any analysis, just some R fun. I wanted to give a heads up for a really interesting function I found in R for anyone that likes Pitch F/X data tools. While Dave Allen is the king of heat maps for Pitch F/X in my opinion, I haven't seen him utilize the smoothScatter function with the RColorBrewer package in this way (or perhaps he does but with different colors or smoothing--maybe someone else does, though). I know that the 'contour' function and some other types always give me fits when I try to use them, depending on what type of data I have. SmoothScatter is easy to use, and it's just like making a regular scatter plot with a color representation that you can make a 'heat map' type.

The function does a kernel density estimate and then automatically blends the colors on the 2-dimensional scatterplot. If you want, you can also include the actual points along with other options. It seems to work pretty well (though I had to create my own palette in RCB that was a reversal of what the default was...I imagine there's a more elegant way to do it than I did, like reverse score the density estimation output).

Anyway, here's the heat map of the same data I had in a previous post for Johnny Cueto. It's just pitch location, and the strike zone is not normalized (it's simply an average height of batters for all pitches for that game). Red is where lots of pitches were, while blue indicates areas where pitches were not located. I'm still working on the key for that, but I think it's pretty straight forward (also, it's the catcher's view). Finally, I provided the R-Code below in case you want to implement it with your data (I apologize for the terrible code formatting, but Blogger really doesn't seem to have many options). The code is very simple. If you want the data to be smoother or less smooth just increase (decrease) the 'nbin' option and play with the function in the 'transformation' option. Just a note, the 'col="black"' portion is used only if you decide to include the points on your plot. I found both ways to be kind of neat depending on what you want to know.

Finally, from what I've read just now, smoothScatter is a part of the base R graphics package. However, if you have an older version of R, I'm not sure which package it is part of. It used to be part of 'geneplotter', but that does not seem to be available on the Install Packages menu anymore. I have the new version at my office, but not here at home...otherwise, I would have changed the color of the strike zone outline to be more visible. But that's an easy fix if you're doing it at home. I'd recommend playing with different color palettes as well and updating your R version to the newest one (64-bit if you're not running a crappy 32-bit Vista OS on your home machine like me...).


R CODE:

library(graphics)
library(RColorBrewer)

brewer.pal(11, "RdYlBu")

buylrd <- c("#313695", "#4575B4", "#74ADD1", "#ABD9E9", "#E0F3F8", "#FFFFBF", "#FEE090", "#FDAE61", "#F46D43", "#D73027", "#A50026")

smoothScatter(data$pz~data$px, nbin=1000, colramp = colorRampPalette(c(buylrd)), nrpoints=Inf, pch="", cex=.7, transformation = function(x) x^.6, col="black", main="Johnny Cueto Location (May 11, 2010)", xlab="Horizontal Location", ylab="Vertical Location")

lines(c(0.708335, 0.708335), c(mean(data$sz_bot), mean(data$sz_top)), col="white", lty="dashed", lwd=2)

lines(c(-0.708335, -0.708335), c(mean(data$sz_bot), mean(data$sz_top)), col="white", lty="dashed", lwd=2)

lines(c(-0.708335, 0.708335), c(mean(data$sz_bot), mean(data$sz_bot)), col="white", lty="dashed", lwd=2)

lines(c(-0.708335, 0.708335), c(mean(data$sz_top), mean(data$sz_top)), col="white", lty="dashed", lwd=2)