Thursday, September 29, 2011

Crediting the Rise of "Data Science" to Sabermetrics

As a graduate student in Sport Management, Statistics and Economics I am quite interested in the emerging "Data Scientist" profession. My current skills in programming are mostly limited to statistical programming in R, Stata and SPSS (I am trying to begin dabbling in SAS and Matlab more), I wish I had more skills with Python, C, SQL, Perl, Access and the like in order to scrape data myself more efficiently. I can do some basic SQL queries and read Perl script to understand *what* it's doing, but starting from scratch with these things would require a bit more free time than I have at this point in time.

I could really become more efficient in my R programming (something I continue to work on) and given the popularity of SAS outside of academia, it would be good to get familiar with advanced programming here. Unfortunately, I have never had a formal computer programming class. Most of the statistical programming has come from my own fiddling and learning statistics in classes here at Michigan. Don't get me wrong. I think I have a relatively unique and useful skill set, but there's always lots to learn and there are many other places exhibiting skills that I just don't have. And definitions of "data scientist" often include significant database management ability. I have some skills here, but they are not anywhere near those of a formally trained computer scientist or IT/data architect.

Anyway, the point of this post is to redirect readers to this presentation by Harlan Harris who talks about what "data science" really is. Why link it here? Well on the final page, Harris says the following:

"Sabermetrics was a trigger for widespread growth. Demonstrated wider applicability of stats methods, and drew attention from business."

A pretty strong quote, and one that I do agree with in some sense. Interestingly, sports have been one of the slowest to adapt to these changes in technology and ability to get into data. Harris suggests here, I think, that other businesses caught onto sabermetrics before those that the analysis was directed toward did. Pretty interesting stuff! I think the combination of open source programming and rise of blogging was the real culprit here. However, sabermetrics provided talented people with a way to apply data science to something fun and interesting. In this sense, it made it easy to communicate stories about the usefulness of data analysis in everyday business decisions.

So here's my question to those doing analysis with sports data: Would you consider yourself a "data scientist"? And if so, do you feel that full-on "hacking" skills are required to consider oneself as such? Certainly they're a plus, but can two heads (a stat-based person and a Perl-to-SQL scraper) come together and both be data scientists? Leave me something in the comments if you'd like!


  1. Thanks Millsy. You are very right. I also think that other businesses caught onto sabermetrics before those that the analysis was directed toward did. It makes everything very easy to analyze and understand.