Using Tree Ensembles to Analyze National Baseball Hall of Fame Voting Patterns: An Application to Discrimination in BBWAA Voting
Brian M. Mills and Steven Salaga
We began this work a while back actually as a class project, and decided to turn it into an academic paper with some guidance and encouragement from our adviser. A paper using the same technique came out last year (Frieman, 2010) which gave us a chance to add to this work by including pitcher predictions and extending the work to the economic literature on discrimination in Hall of Fame voting. Our work differs somewhat from Frieman, and this is explained within the paper. In fact, a (very) preliminary version of the work was on this website a while back; however, after the Frieman paper was published, I was worried a bit about getting scooped even more so (no foul play there--just happened to be doing a very similar analysis at the same time).
Of course, R was used exclusively for the analysis. Also, you may note that some familiar names are cited. These include Cy Morong, Bill James, Jayson Stark, Peter Gammons, Tom Verducci, Chris Jaffe and, yes, Tom Tango (related to the Tim Raines site, of course).
If you have crtiticisms, please present them respectfully and keep in mind that we don't think this analysis (or ANY analysis) is the last word on any issue. And also keep in mind future predictions are only based on statistics as of 2009 (without career projections). So they predict future induction under the assumption of retirement after the 2009 season. But it was a lot of fun and it shows some promising results for the using technique in sports prediction. There is a lot of Hall of Fame voting literature out there, and this is another addition to it. Hopefully we can have a comprehensive model of hockey players soon now, too.