Friday, July 29, 2011

Joint Statistical Meetings in Miami

I am headed off to Miami for the 2011 Joint Statistical Meetings on Sunday. I'll have a poster to present with a fellow graduate student and look forward to experiencing a new conference with a very different bunch than I normally interact with professionally (though, closer to those I interact with online). If you're going to be attending, stop by the Section in Sports Contributed Poster Session and see our poster. The poster investigates Hockey Hall of Fame voting patterns (skaters only) and the possibility of language-based bias. Long story short is that we don't find much, but there is more to do and that does not necessarily mean nothing is happening.

While the meetings are for statistics in all disciplines, there is a lot on sports there. Phil Birnbaum will be presenting some of his findings with respect to race and strike calling (and there is an additional poster on the topic) and Shane Jensen will be giving a roundtable talk on fielding metrics. Check out the full sports program here.

Wednesday, July 20, 2011

Non-Sports Link: Libertarians and Progressives Can be Friends

A great article from the author of the Bleeding Heart Libertarians blog. I'll admit that there isn't a better word to describe my political views than "libertarian". I'm certainly not Milton Freidman or Jeffrey Miron--both of which I admire and respect greatly--but I can't consider myself "Bleeding Heart" either. Maybe many of my issues are with extreme--and unfortunately often uninformed--left swinging folks that I had to deal with at a very liberal undergraduate institution. But I often despair that when people think libertarian (and often times generalized to "Economists") it is unfortunate that they often think of someone with no values and little empathy. Matt Zwolinski does a great job of communicating this, and I especially like the following quote:

"These are my reasons for thinking that progressives should have greater confidence in free markets and civil society to realize their values, and less confidence in government regulation. But even if progressives are not convinced by that claim, I hope they are convinced by another one: namely, that political disagreement does not always, or even usually, imply an irreconcilable conflict of fundamental values. Progressives and libertarians should realize that they share many more values in common than they probably think, and that their different political prescriptions are less the product of an epic battle of good vs. evil and more a function of reasonable disagreement regarding how to prioritize and realize their common goals. Even if disagreement persists, bearing this point in mind should make that disagreement a more civil and productive one."

Libertarianism and moral values are not mutually exclusive. The economic prescriptions of a strictly libertarian viewpoint are an invaluable starting point to base policy. Once we have that cost-benefit and understanding of efficiency of a free market, one must turn to the values of the society and the best balance of both in order to foster both economic and societal growth. As Zwolinski says, "Good intentions, even when they exist, are not enough."


Hat Tip: ECON Jeff Blog (see sidebar)

Thursday, July 14, 2011

Sam Fuld, Bob Carpenter, and Statistical Inference Blog

Here is a quick post responding to a request by Bob Carpenter at one of my favorite nerd blogs: Statistical Modeling, Causal Inference and Social Science. While a lot of the Bayesian theory is out of my league, Dr. Gelman really makes you think about some applied statistical problems in social science.

Anyway, the request was for a quick scatter plot (I'm not going to go nuts and pull out Bugs code for some Bayesian Hierarchical Model or anything like that here!) of batter performance and ability to foul balls off in given counts (I could also do base-out states, but I'll keep it simple for now).

Luckily, I had R up and running with my Pitch F/X database already in. Of course, a full analysis would require understanding where the pitches are thrown that are being fouled off (along with velocity and pitch type), but then it gets a bit complicated. Anyway, here we go. I'll start with a quick table of averages for percentage of pitches fouled off in each count (please excuse the awful table formatting here).


0-1 0-2

17.61% 19.20%

1-1 1-2

20.46% 22.44%

2-1 2-2

23.30% 26.00%

3-1 3-2

21.48% 29.91%

From this, we can glean that guys don't foul the ball much in 3-0 counts. This could be because they see easier pitches to hit and/or they're taking the pitch very often. Probably a combination of both. Keep in mind that these numbers are also biased. We don't see the same batters the same number of times in these different counts. Now for foul percent plotted against wOBA:

If anything, there's a slight downward trend here (as found before at Baseball Analysts, linked at the previous link). And finally, foul percentage plotted against wOBA for each count. Here, I removed outliers (well, outliers defined as 2 standard deviations above the average foul rate), as they should make up most of the players who did not get nearly enough at bats for the foul rates to matter. This didn't work perfectly and there are some obvious anomolies likely due to low plate-appearances, but I think we get a decent look at things. Also, the lower censoring (at 0) makes it more difficult to pick up a pattern in the plots. In addition, the plot includes player-seasons, not just players. So someone like Pujols will be in here 4 times (2007 through 2010):

It might be instructive to look at these same plots only for pitches swung at (so players aren't penalized for being selective at the plate) and/or only on pitches near the edges of the strike zone (so we're just looking at pitches that the players are fighting off). The analysis here doesn't show too much going on, but that doesn't mean there's nothing there.

Below, I've done the latter, with the same plots from above. I define the edge as 8 inches from the center of the plate and/or below 1.8 feet or above 3.3 feet vertically. Of course, you can define the edge in a number of ways. This is rough, quick code and I didn't have time to get into too much detail today:

Keep in mind this is only for Pitch F/X data. That means some of 2007, and all of the 2008 through 2010 regular seasons. I try to wait until the end of the season to update my database each year. I imagine this would be more interesting with even more years of data (like from Retrosheet, as mentioned in the linked blog post). I think Dan Turkenkopf is going to try this out, as he says in the comments. Perhaps I'll extend this later on to the swinging only as well.

Finally, one other thing to look at is whether pitchers really do get frustrated after a long string of foul balls and get burned throwing a pitch down the middle. There is probably a skill somewhere between fouling pitches off and flat out missing those pitches just because a better batter likely make contact more often. But in terms of purposefully trying to foul a pitch off--at least from my own experience playing baseball--I have doubts that guys go up there looking to 'spoil' pitches. To foul a pitch off, you have to make sure it doesn't hit the bat directly, otherwise it would go into play. Hard to believe that in and of itself would be a repeatable skill. To just edge the bat to the ball, you've got a good chance of missing it, too.

This is by no means a deep analysis, and I didn't do any sort of fantastic job at cleaning it up beforehand. Just some fun crosstabs and scatter plots.

Any thoughts from those of you reading this????

Tuesday, July 5, 2011

Forgot to Announce This

Though I'm late on this, I've been in the habit of announcing presentations of things I have been working on recently. At the WEAI conference, I am a co-author on two presentations (one of which I have put together the majority of the analysis). Unfortunately, I was unable to get funding for WEAI because I am attending a bunch of other conferences this summer, including the Joint Statistical Meetings in Miami at the beginning of August. Anyway, here are some recent presentations (they were given by Dr. Rodney Fort and Dr. Jason Winfree, respectively). You can get the full Western Economic Association International conference program right here.

Attendance Time Series and Outcome Uncertainty in the NBA, NFL, and NHL
Brian Mills and Rodney Fort

Discrimination Among MLB Umpires
Scott Tainsky, Brian Mills and Jason Winfree

The first paper simply looks at the long-run stationarity of attendance in the three leagues and assesses--at a very simple level--the influence of competitive balance (playoff, game and consecutive season uncertainty) on these attendance levels. This is part of my dissertation, and there are a number of issues to be dealt with (not the least being the censoring issue for NFL sellouts). I think this paper might bore most of the readers here--unless you're really into Lagrange Multiplier statistics for a unit root with breakpoints.

I imagine that the latter paper would be of more interest to those here. I can't divulge the entire paper (or much of it really), but we tend to find that there is very little going on in the strike-calling data with respect to umpire race. The data go back through 1996 (I think), and I update the study with some Pitch F/X analysis. There's much to do, though.

In addition to these recent presentations, my fellow graduate student Steve Salaga and I will be presenting on Language-Based Discrimination in NHL Hall of Fame Voting at the Joint Statistical Meetings. There is a whole section on sports statistics there, with a presentation by Shane Jensen on fielding metrics. It sounds like a lot of nerdy fun. For this paper, we implement a technique called Random Forests (spoiler alert, we don't find any evidence in the analysis of discriminatory behavior). This is a parallel analysis to our forthcoming paper on MLB Hall Voting Discrimination in the Journal of Quantitative Analysis in Sports. When I know the issue, I will link it here. If anyone is dying to read it, let me know.

Lastly, I would encourage anyone interested in sports statistics to attend the New England Symposium on Statistics in Sports. For those interested in soccer (futball, football), there is a soccer analytics competition being run by StatDNA. The winner gets a trip to the conference to present their paper and a $500 prize. I am currently working on some things with some people you may know, but I won't be mentioning anything until later on. It's been fun.

Okay, off to get some work done. Sorry that I have been somewhat MIA of late. Been really bogged down with a lot of different projects. Hope to get back to sab-R-metrics soon.