Tuesday, April 26, 2011

A Must Visit for Baseball Fans in Napa

Back in the first week of March, my fiancee and I headed off to Napa Valley for a week of wine drinking and nice weather. I'm more of a beer guy myself, but find it interesting to taste new things. Give me something different and I'll be happy. In the end, most wines taste pretty much the same to an unsophisticated palette like mine. But I can sure tell the difference in an IPA.

Anyway, we took a really fun "Wine Tour" through Beau Wine Tours. I would highly recommend this, and at only $100 a person for a full day of boozing, it's definitely worth the money (note: there are wine tasting fees at some places...they are reasonable). If you are in the area and choose to do this, ask for Damon. He was great. If you don't have a big group, that's okay, as they'll just pair you with other random people in the limo. This usually makes for good fun, and we ended up riding around with other people from Michigan!

The wine tour takes you to four different wineries chosen by your host, the limo driver. You get a very scrumptious picnic lunch from a local sandwich shop and they'll even pick you up and drop you off at your hotel. It was really the highlight of heading off to Napa, but going around on your own is fun, too. So why am I talking about wine tasting on a sports blog?

Well, the reason I say ask for Damon at the Wine Tours is this: he is friends with the family that owns Hill Family Estates. It is a small place in Yountville and--if you're the non-wine snob male party like me--you'll love it. BUT, if you're a wine snob, you'll also still really enjoy it. It's got a bit of everything, and they cater to this very well. The owner is a younger guy (well the family owns it, but he seems to run a lot of the marketing) and is into sports, music, surfing, and the like. But here's the kicker: they sit you down for a free tasting (if you come with Damon), fresh cheese and Italian meats in a room covered from wall to wall with sports memorabilia.


Every year, Ryan and the Estate does a 'wine staining project'. Here, they take all sorts of memorabilia like surf boards, guitars, baseball bats, etc. and pair it with a wine that is 'inspired' by an athlete or someone of the sort. They do a lot of this with baseball players, including Rick Ankiel, Luis Gonzalez, Bronson Arroyo, Tom Glavine and Greg Maddux.



As a kid, I was never the overpowering pitcher, so I absolutely loved Maddux. This was the first place we went on the tour, and I couldn't help myself. I splurged, and bought only one bottle of wine on the trip (for $75 I might add): Greg Maddux Wine. It's not autographed, but I had to have it after being there. This is it:


There are some Bronson Arroyo and Luis Gonzalez bottles left with signatures (unfortunately, $275 for a magnum Arroyo autographed bottle was a bit steep for a grad student). But the standard bottles with autographs are the same price as the non-signed ones. There were no Maddux signatures (DAMN!), but I still found this to be pretty cool. The autographed wine-stained bats are neat, too.


It looks like the next wine (a 2007 red wine) will be coming courtesy of Johnny Damon. I believe they try and do something with a World Series winner each year. You can get an autographed bottle here. And they do all sorts of cool packaged things like this Arroyo boxed package if you just can't get enough Arroyo:


All in all, these are fun things to have on the shelf. Ultimately, they're just unique, they scream 'baseball nut', and yet have a sophisticated side. But the real fun comes from going to Hill Estates in Yountville and sitting there doing the tasting, talking to the people there, and checking out all the cool wine-stained stuff. You can get the bottles there, and have a really fun experience. Thanks to them and Beau Wine Tours for showing us a great time.


(GUESS WHO'S HOLDING THE BASEBALL BAT!)

Saturday, April 16, 2011

Trackman Position Needs R Knowledge

Thought some of the R-Blogger readers would be interested in the position linked below. If you're a baseball fan and like working in R, this is a fun company that seems to be getting more and more press. Recently, it was featured in Sports Illustrated and has been covered on ESPN as well.

http://www.workinsports.com/wisquickregapply.asp?referrer=793&idx=64599

I've interfaced a bit with the people at Trackman in the past and they really are excited about the stuff they do. I can't say much more about the data/position as I have signed an NDA with the company. However, I can say the position is recommended for those with R knowledge specifically (which is also indicated in the job description).

Friday, April 8, 2011

Umpire Bias Favoring Catchers At Bat?

Recently, I've been working a lot with umpire data. A lot of this has to do with the nice big sample sizes that it provides for most umpires, which just makes it easier to infer interesting things from the data. Today I thought I would check out something I came across when looking for B-Pro topics over the past couple weeks.

It is relatively well-known that umpires and catchers do their best at keeping a good rapport with one another behind the plate. Catchers need to be diplomatic when asking about a call on a given pitch, as umpires may not take well to being called out for an incorrect call. The implications of these interactions very well may be important for the catchers’ battery mate standing 60 feet 6 inches away, and the hope is that—if the catcher has any effect on borderline calls—it will be a positive one for his pitcher. This is a difficult thing to measure, so I’ll have to leave this for someone else who has access to players and can survey them.

No, today I’ll be asking a related question, though from a different angle: Do umpires give catchers the benefit of the doubt when they’re at the plate? If catchers are being cordial with the umpire behind the plate, then this could be a result of both team-level and individual-level incentives. If the catcher knows that being nice will improve his experience at bat, then he may have a strong incentive to be nice.

But there’s a cognitive aspect to this from the umpire perspective as well. If an umpire screws up a call for a catcher, then he has to face the guy in a couple of outs right there behind the plate. The rest of the team heads off at least 60 feet away, where the umpire won’t be close enough to hold their hand. Think about how you might talk to someone you can’t stand on the internet. Then think about whether you would say those things to their face. Are the two interactions different? Could this cognitive aspect come into play when making ball-strike calls when catchers are up to bat?

To answer this question, I’ll use Pitch F/X data from 2008 through 2010. I include a dummy variable for whether or not the player is a regular catcher, as well as variables giving us information about the distance that the pitch is from the center of the strike zone, whether or not the umpire made the ‘correct call’ (based on a strike zone covering the width of the plate and within a height of 1.75 ft. and 3.45 ft.), and a few other controls. Using a few different types of analysis, I’ll do my best to tackle the data.

My first step was to run a simple logistic regression on the calls made by the umpire against all batters during this time period. In other words, I’ll be predicting the effect of my variables on the probability that a certain pitch is called a strike, holding constant the location, count, batter/pitcher handedness, inning, and so on. The dependent variable here is whether or not the pitch was called a strike (1=called strike, 0=not called strike), and the data include only calls made by the umpire.

One thing to consider, however, is that catchers could be coming in at a bit of a shorter height than the rest of the batters in the sample. Because I used a fixed top and bottom of the strike zone, those pitches at the edges of the top and bottom could be biasing the catcher effect. The average MLB roster is about 6’ 1” or 6’ 2” (http://espn.go.com/mlb/stats/rosters/_/sort/null/order/false), while the average catcher height came in at just under 6’ 1” last season (http://www.answerbag.com/q_view/2021290). If we assume the difference is about 1 inch between catchers and the rest of the league, we should probably account for this. Within the data set, for a given pitch, the difference in the strike zone of catchers vs. non-catchers is roughly less than half a vertical inch. For the purposes of this preliminary look, I proxy batter height using the listed top and bottom of the zone within the Pitch F/X data. While the provided numbers are extremely noisy and problematic for choosing whether or not a pitch was “within the zone”, they should work well as a rough proxy for the height of the batter on average. This likely won't control enough for height, but running the regression with and without the top and bottom of the zone variables does not really change anything with the catcher variable at all. This could mean one of two things: 1) Height is not an issue or 2) The sz_top and sz_bot variables aren't just noisy, but completely worthless (a very real possibility).

One other thing to check is whether or not catchers just see more pitches within the strike zone defined earlier in this article. It turns out that there is not a statistically significant difference in the number of pitches seen within the fixed zone for catcher as for other players. This also gives us some slight evidence that the height of catchers isn’t too much of a problem in the model. If catchers are significantly shorter than the rest of the population, we would expect that pitchers would adjust themselves to throw pitches within this smaller zone. However, the spray of pitch locations is pretty much the same for catchers and non-catchers. For brevity, I do not include the results of this regression (though, they can be had upon request).

Below is the output from the logistic regression on the probability of a strike call:

Variable:

Estimate

Std. Error

z-value

Sig.

(Intercept)

6.115965

0.061387

99.629

***

count.0.0

0.501851

0.025818

19.438

***

count.0.1

-0.085477

0.027324

-3.128

**

count.0.2

-0.466645

0.034037

-13.71

***

count.1.0

0.685788

0.026874

25.519

***

count.1.1

0.146811

0.027805

5.28

***

count.1.2

-0.192775

0.030828

-6.253

***

count.2.0

0.877375

0.02952

29.722

***

count.2.1

0.351544

0.029845

11.779

***

count.2.2

-0.010799

0.031409

-0.344

0.731

count.3.0

1.068622

0.033826

31.592

***

count.3.1

0.557583

0.03356

16.614

***

count.3.2

Base-level




factor(end_outs)=1

0.516444

0.011563

44.664

***

factor(end_outs)=2

0.475019

0.011598

40.956

***

factor(end_outs)=3

0.815832

0.012411

65.735

***

factor(pitcher_throws)=R

0.150937

0.01312

11.505

***

factor(batter_stand)=R

-0.259115

0.01416

-18.299

***

linear_distance_from_centerpoint

-7.177881

0.01522

-471.622

***

catcher

-0.122086

0.011084

-11.015

***

pitcher_throws=R & batter_stand=R

-0.121216

0.016309

-7.432

***

The effect in the regression for the ‘catcher’ dummy variable is statistically significant and larger than I would have expected (some of this could be coming from differences in height, despite my attempts at controlling this variable). On average, a pitch that is at the edge of the zone (normally a 50-50 change of being called a strike) is about 12.5% less likely to be called a strike if the batter is a catcher. For those unfamiliar with logistic regression, I won’t go into explaining how this changes as the probability of a strike call otherwise increases or decreases. The estimated effects of coefficients in logistic regression can't just be read off the regression table when pitches are closer to a 1 or 0 probability. So with catchers, it's likely that a pitch down the middle is still a strike very near the same rate, while a pitch 10 feet high is still a ball at very near the same rate as other batters.

This is a pretty interesting contrast, but there could be a one other thing confounding the result: I have not controlled for the talent of the batter. We know that catchers are generally not as adept at hitting the ball as those at other positions, if for no other reason than the top hitting catchers are often moved to another position early on. If umpires are ‘compassionate’ toward players who just aren’t very good hitters, then we could be picking up this effect here. I don’t currently have an answer to this issue, as I do not have individual player performance in my Pitch F/X data at this point. If it is the case that this is an effect of the umpire interacting with the batter’s skill, then it is also an interesting issue to be looked into later on that likely needs to be controlled for in the data. I am in the process of greatly improving the information in my Pitch F/X database, so hopefully I can take a look at this stuff as well.

I took the first model a bit further and followed a technique that J-Doug has used in his fantastic ‘Compassionate Umpire’ articles. For this second model, I used an indicator variable of whether the strike was in the batter’s favor (ball within the zone, called a ball), the pitcher’s favor (call outside the zone, called a strike), or neutral (“correct call”). This is the dependent variable in an ordered logistic model. As the indicator increases (from -1 to 0 to 1), the calls are coming more “in favor” of the batter. This sheds some further light on any increase in probability that the ball will be in the batters’ favor, given that he is a catcher. I again don’t present the full regression output as they simply confirm the earlier finding; however, this model also indicated a significant increase in favorable calls for catchers.

So, where is this difference in strike-calling coming from? Well, looking at the contour for the 50% call rate for left handed and right handed batters, we can see below. In the panel on the left, I plotted the 50% contours for RHB that are catchers and non-catchers, while on the right panel, we have left-handed batters (plots are from the umpire’s view). In both panes, you can see that umpires are a bit more lenient with inside pitches for both right and left-handed batting catchers. Right-handed catchers seem to get calls in their favor up in the zone, but this very well could be a result of these catchers being a little shorter than their non-catching counterparts. You can see that for left-handed catchers, the zone is shifted upward a bit. So the ‘height’ factor seems to be relatively ambiguous compared to the inside corner difference, especially considering that the lower limit of the zone for RHB is almost identical for both groups of batters. A word of caution: comparing differences this small on plots like this is not a replacement for more rigorous analysis, but they are interesting to look at once we understand some of what is going on in the data.

This is of course not certain evidence that there is something going on with the catchers at bat, but it seems to point to something interesting. I’d like to look into this phenomenon (if that is what it is) and be a bit more confident about the height of the batters and the possibility of umpires being ‘compassionate’ toward less skilled hitters, rather than catchers themselves. If anyone has batter height available by player_id (the ones included in the Pitch F/X database format from Mike Fast's tutorial), I'd love to be able to include this in my data. That way, I could try and provide a bit more accurate umpire performance estimations as well.

In the end, it very well could be that the closeness of the catcher and the umpire has an effect on the umpire taking the bat out of a catcher’s hands. But replication and improvement is always key in this sort of analysis, and I think it is needed here. I’d love to hear some reactions to the analysis, and am always willing to hear shortcomings of the approach here. I have a hard time coming out and proclaiming a definite bias without better controlling for the height of the batters, but the effect seems to be large enough that at least some of it is real.