Article at JQAS: Baseball Hall of Fame Voting

The newest issue of the Journal of Quantitative Analysis in Sports has been published today, and it features a number of interesting articles. In honor of shameless self-promotion, I would like to highlight the following article:

Using Tree Ensembles to Analyze National Baseball Hall of Fame Voting Patterns: An Application to Discrimination in BBWAA Voting
Brian M. Mills and Steven Salaga

The link above should be un-gated. If it is not, please let me know and I can share the article. This is my first, first-author academic publication so go easy on me (and Steve). If you read my recent post about our joint poster at the 2011 Joint Statistical Meetings in Miami, this analysis should sound rather familiar. Please place questions or feedback in the comments if you have them, or feel free to shoot me an email.

We began this work a while back actually as a class project, and decided to turn it into an academic paper with some guidance and encouragement from our adviser. A paper using the same technique came out last year (Frieman, 2010) which gave us a chance to add to this work by including pitcher predictions and extending the work to the economic literature on discrimination in Hall of Fame voting. Our work differs somewhat from Frieman, and this is explained within the paper. In fact, a (very) preliminary version of the work was on this website a while back; however, after the Frieman paper was published, I was worried a bit about getting scooped even more so (no foul play there--just happened to be doing a very similar analysis at the same time).

Of course, R was used exclusively for the analysis. Also, you may note that some familiar names are cited. These include Cy Morong, Bill James, Jayson Stark, Peter Gammons, Tom Verducci, Chris Jaffe and, yes, Tom Tango (related to the Tim Raines site, of course).

If you have crtiticisms, please present them respectfully and keep in mind that we don't think this analysis (or ANY analysis) is the last word on any issue. And also keep in mind future predictions are only based on statistics as of 2009 (without career projections). So they predict future induction under the assumption of retirement after the 2009 season. But it was a lot of fun and it shows some promising results for the using technique in sports prediction. There is a lot of Hall of Fame voting literature out there, and this is another addition to it. Hopefully we can have a comprehensive model of hockey players soon now, too.

