Thursday, May 9, 2013

Revisiting Umpire Discrimination: New Paper at JSE

Two colleagues (Scott Tainsky and Jason Winfree) and I have a new paper just posted online at the Journal of Sports Economics.  We revisit the findings of Parsons et al. from 2011 (though, the working version of their paper caught press much earlier than this).  The paper was rather controversial and claimed important influences of umpires on game outcomes based on race.

Our paper uses a different data set and looks to replicate the findings from the original AER paper.  We were able to replicate the original findings from their provided data and code, but find odd uses of fixed effects are at the root of some of the findings.  A large majority of the paper looks at the robustness of the results, and implements Pitch F/X data to empirically derive the edge of the strike zone.  At best, the results initially presented in AER are mixed based on our analysis and re-analysis.

One thing to note is that the main interest of the Parsons et al. paper was not baseball.  The point was that detecting discrimination could be influenced by others that impact the performance of those of a given race (i.e. umpires in this context).  This point is still well taken, and makes up the most important contribution.  In fact, this is why the paper was published in the prestigious journal American Economic Review.

The link directly to the paper and abstract are below.  Unfortunately it is gated.  However, I am going to double-check my rights for including a link on my personal page (usually OK, but journals can sometimes be a pain on this issue).  If you have access, feel free to send along questions or comments to my email address or leave them in the comments.  Please make these comments and/or criticism constructive.

http://jse.sagepub.com/content/early/2013/05/02/1527002513487740.abstract

Saturday, May 4, 2013

Times Change, Or Why Steroids Don't Ruin Baseball for Me

Just a list of links without commentary other than this: I honestly don't care about the Steroid Debate beyond making clear how stupid it is.

Mantle Corked His Bat (insert asterisk here, right...right?)

Athletes Have Gotten Better, Mostly Without Steroids (imagine that!)

The Hall of Fame is Biased (well, I never!)

Friday, April 12, 2013

Brawling Costs Teams Money

I honestly don't know where to begin with the stupidity involved in this:

http://www.usatoday.com/story/sports/mlb/2013/04/12/quentin-charges-greinke-after-being-hit-by-pitch/2076525/

This idiocy cost the Dodgers a whole lot of money.  If I were running the team, I would at the very least seek legal counsel in order to evaluate the chance of getting some of Grienke's contract dollars back.  Yes, MLB contracts are guaranteed.  But he was injured because he assaulted someone.  Unpaid suspensions happen for PED users, so there must be some way to reconcile this.  Has there been any precedent to this sort of thing?  I don't see this behavior as assumed risk on the part of the Dodgers, though I guess one could argue that their supervisors (i.e. Mattingly) could have prevented it.

Grienke claims he did not mean to hit him.  Sure.  The catcher was set up outside and Grienke is a Cy Young contender.  Spots don't get missed like that.

Lest we forget the impact on the individual 2-1 game itself.  San Diego tied it up in that inning.  While it ended up in favor of the Dodgers, that's not something I want my players flirting with after spending over $200 million on them.

Tuesday, March 5, 2013

Refs Complicit in Fighting?

To begin, I'm not much of a hockey fan.  I just don't get it.  That doesn't mean it's not entertaining to you, to the entire population of cold-weather living people, that these guys aren't incredible athletes, or that it has no value.  It just means I don't enjoy watching it.  I watched it plenty as a kid, going to 5-10 Capitals games a year for a number of years.  I think it is an extremely interesting league from the standpoint of my academic interests.  But I never really enjoyed watching.  People say the same to me about baseball.  That's fine, it's not for everyone.  So you can take my following comments with a grain of salt if you like, or as blatant ignorance of what goes on in the sport on the ice.

Despite the idea that hockey has attempted to get rid of fighting, it is obvious to me that this is pure theater by Bettman and the owners.  In fact, given the video below, I suspect there is an explicit instruction to referees to not actually break up the fights until someone hits the ice.  (Hat Tip to Charlie Brown for the video)

http://www.youtube.com/watch?v=KJqN522mkFM&feature=player_embedded

These guys took up their boxing positions with everyone watching, including fellow players and referees, and not a single person bothered to try and separate them or stand between them.  In fact, the referee takes the initiative to pick up the debris (stick, etc.) and get it out of the way for when they throw down.  There is little question in my mind about the referees' complicity in these events on the ice, and I would not be surprised if they were given explicit instructions to let these things play out for the entertainment of the fans.  There is not a real safety concern for the refs or the players in breaking these two up as they are standing 10 feet from each other in their boxing poses.  This one looks almost to the point that it was staged.

McGinn had a broken orbital bone, likely having to do with his face-plant into the ice.  That is not a minor injury.   Not even close.  I know it has been said before, but if this happened in the stands someone would be on their way to prison.  This on-ice fight is no more acceptable to me than the video below, though I imagine there is more outrage there than the hockey fight.  At least in the baseball game, everyone didn't stand around and look the other way for a full 30 seconds while the batter/runner punched the pitcher in the face (Hat Tip to Tangotiger for the video below).

http://www.youtube.com/watch?v=DeKp8e88ZyI&feature=player_embedded

Note that I have the same issue with throwing at batters.  For a long time, I loved Pedro Martinez as a player, but after his many escapades with throwing at batters (not just throwing inside, but his throwing AT them and then talking about it) I no longer had any interest.  I feel the same way about Cole Hamels after the Bryce Harper beaning.  I don't think Selig did enough.  Hamels should have been suspended for the season.

Congress chastises leagues for PED use (particularly baseball for whatever stupid reasons they may have).  But why don't authorities bother with these sorts of incidents, where the league (with questionable antitrust status) is complicit in injuring its employees?  Assumed risk does not include violent assaults in any profession (and I would argue that this even includes boxing and MMA).

Let's make a comparison.  Wikipedia reports that the rate of reported aggravated assaults yearly in Detroit, Michigan is about 0.18% (1,334 assaults per 713,000 or so people).  Detroit isn't exactly a peachy place to live, in terms of crime.  In fact, the shortage of police there is becoming a huge problem.  Some calls take hours before an officer arrives at the scene.  In Los Angeles, a safer city but also a place where violent gang crime has been a serious issue in the past, there are 230 aggravated assaults per 3.84 million people.  That's a rate of .006%.

From Hockey Fight Statistics, in the 2011-2012 season (the lowest fight penalty rate since 06-07), there were 546 fights.  Give or take 700 total NHL players in a given season, we have a rate of about 78%.  That is an aggravated assault rate of 433 times the rate in the city of Detroit.  It is 13,000 times the rate in Los Angeles.

Incentives tend to work.  If you are caught breaking people's skulls in Detroit--even given the lack of police force there--you go to jail.  Same goes for LA.  The incentives against fighting in the NHL (and hitting batters in MLB) are laughable, at best.


**Note: Yes, there are probably differences in the severity of crimes that are reported in Detroit and LA, versus all "fight penalties" in hockey.  But even assuming that unreported aggravated assault in these cities is ten times what is reported, and assuming that only a quarter of NHL fights would be up to the standards of aggravated assault, the differences are still astonishing to me.

Tuesday, February 12, 2013

Employment Bias Toward Athleticism

Something I have always suspected happening in labor markets does, in fact, seem to be happening: hiring managers tend to give a premium to those signalling athletic ability or sport participation.  The paper, by Dan Olof-Rooth, looking at these results is linked below:

http://www.sciencedirect.com/science/article/pii/S0927537110001272

I think this is extremely interesting (and mirrors the studies that randomized "African American sounding names" on resumes, finding bias there).  Of course, there are different signals sent with athleticism vs. the sound of names.  A name isn't likely to signal much, maybe skin color, which we all know is not a valid way to exclude someone from a job.

However, athleticism could signal something else: motivation and time management.  My undergraduate thesis (unpublished) found that student athletes felt (self-reported) much better about their time management skills than non-athletes.  This could be a useful signal for someone hiring a prospective employee. 

Secondly, those who participate in sports tend to be more active and have more energy than those who do not.  These would also seem to be desirable skills for an employer. 

Lastly, being athletic could signal motivation or initiative from the person applying.  This is similar to participating in a club or being president of the young business leaders organization at your university.  I don't know that athletics would give an advantage above and beyond something like this, but it would seem to be at least a useful signal about involvement and social skills.  Team sports are social, and can provide opportunity to grow just as other clubs do.

All of these things are difficult to observe in an interview, so using sport participation as an implicit signal can be useful both for the employee to relay this information, and for the employer to get a bit more information about the prospective hire.  Of course, there is also the possibility of overt bias toward playing football or some other sport at a large university that the employer is a fan of.  This would not be a valid way to make a hire, but I suspect it does happen.  There is always a "buddy network" influencing many areas.

This is why I tend to always put on my CV or Resume that I participated in college athletics and currently continue to play softball and golf.  While it says nothing about my skills as an academic researcher (leaving aside the fact that I research sport), I suspect that at worst it will do nothing for me and at best make the employer slightly more interested.

What say you?

Wednesday, January 23, 2013

Data Science as a Fad?

In many ways, I didn't want to give this Forbes article a link, since it derides the idea of using data in the same ways that seemed to create the (admittedly somewhat imaginary) "scouts-vs.-stats" divide.  There is, of course, vast relevance of data science to management.  I think the article is a bit unfair to the self-created discipline, so please keep that in mind.

However, I also think there are some important points to remember.  Data scientists already engulfed in the management and operations of a given industry are invaluable.  However, data scientists with little understanding of the problems and the practical solutions specific to that industry can be dangerous.  I think this is a nice passage:

"Davenport and Patil declare that “Data scientists’ most basic, universal skill is the ability to write code.” With this pronouncement, data science fails the smell test at the very outset. For how many legitimate scientific fields is coding the most fundamental skill? The most fundamental skill for any scientist is of course mastery of a canonical body of knowledge that includes laws, definitions, postulates, theorems, proofs, and descriptions of unsolved problems. Scientists are therefore characterized by mastery of a body of knowledge, not a collection of methods. What is this body of knowledge for data science? Davenport and Patil admit there is none.

The job of scientists is to conduct independent research, contribute to a body of knowledge, and improve professional practice, while adhering to a recognized standard of conduct. Coding is a tool that facilitates some of these objectives, but is a substitute for none of them."

This point rings true in many cases.  I find myself falling into a "methods trap" in my academic work sometimes (though I try to get out of it as quickly as possible).  I know how to use R, though I am not a programmer or database manager.  I know a number of methods from statistics and econometrics.  I can turn these tools into something pretty neat.  But, I sometimes make the mistake of thinking that this is enough for researching some phenomenon.

...Then I try and write the Intro and Discussion for my paper.  Ouch.  This is amazingly difficult without reaffirming that body of knowledge about the problem at hand in the first place.  Methods and very cool visuals communicate answers.  But they are the tool to do so, not the answers themselves.  A point well-taken from the article.


Thursday, January 10, 2013

Factor Analysis with the HOF Voting

A really fun post here, which I ran across at R-Bloggers.  This is a different take from a lot of the stuff I have seen on voting.  Enjoy!

Also, Max Marchi seems to be contributing (non-sports stuff) to R-Bloggers as well. I did not know this until today.

Tuesday, January 8, 2013

Ordering of Series

A new OnlineFirst paper has come out in the Journal of Sports Economics on the ordering of 3-game series by Alex Krumer.  Have not read fully through the paper, but I am interested in seeing what is found.  Figured it would be of some interest to those visiting this site that have access to JSE.

More Power Laws

Looks like someone took DeVany (2007) and extended it to management literature and things outside of baseball (although, they don't cite him).  For a refutation of that initial paper on these power laws in sport, I'll have to link to my co-author, Jason Winfree, and his co-author, John Dinardo (http://www-personal.umich.edu/~jdinardo/lawsofgenius.pdf).

Now, before I go on, I want to be fair to the authors.  They plainly state that they are looking at performance, not innate ability.  It's observational.  I think the question is whether performance is actually what is of interest to researchers when doing management research, or if it is ability (moderated by effort and peers) that management is ultimately interested in.  I tend to think it is the latter, though there are reasons for understanding the former (namely, the link between the two).  So some of my criticisms come from interest in measurement of ability, rather than observed performance data.

I do acknowledge that this likely took plenty of time and effort to go through.  And they do seem to have consulted Wayne Winston on some of the work (noted in the acknowledgements).  Therefore, this post is not saying the authors are lazy, stupid, ignorant, or anything in between.

Let's begin (and note that I'm feeling all Birnbaum-y here).

First, NPR states that this is new research.  It really is not, despite the fact that most of their background citations are from before 1980.  This is an issue that has been discussed at length, but I'll let DiNardo and Winfree do the literature review. 


THE MAIN ISSUE

Issue #1 is that they use claims that everything is normal as the justification for their paper.  But this would seem to be a straw man.  Why would they expect count data (specifically, low counts) bounded at 0 to be a normal distribution?  I'm not sure anyone would try to assume that individual academic publications (with essentially Poisson and lambda = 2 as shown by their tables, perhaps somewhat overdispersed) would be normally distributed, would they?  But they use this to test for normality of performance (actually, this is the case for most of their measurements).

I think a lot of work discussing normal distributions that they seem to be interested in--and the strawman-ish rationale for this paper--probably conflates the Central Limit Theorem with normally distributed populations.  The CLT does not posit that everything is Gaussian, though some have probably said this in their past academic work, and this is often taught incorrectly in introductory statistics courses.  If the authors are using this as the basis for their article, then they seem to be wasting space in what looks to be a good journal (based on impact factor).

So what is the CLT?  Using the mean (average), for example, the CLT posits that the distributions of sample statistics (means) of random samples of a population will be normal (assuming it is not some weird distribution with infinite variance, etc.).  So I'm not sure why they chose to compare individual scores to a normal distribution, rather than the means of a bunch of samples of those individuals.

They should have taken their (admittedly, very awesome) data and done a quick random sampling using R or something.  Take the mean of each sample they take, and then build a distribution of those sample means.  THEN, they should test for normality.  That way, we test the applicability of the CLT to the given data, rather than testing the data to be from a distribution where the CLT won't apply.  I think this gives them a much stronger hypothesis to base their tests on.

But here is the most disappointing part: They don't even test any distributions besides Gaussian and Paretian on the raw individual data.  They should also be testing the Poisson and Negative Binomial (or any number of other distributions), not just Gaussian and Paretian, if the raw data is really what they're interested in.  I imagine that there is some other distribution that fits this data just as well as, or better than, the power law.  Or maybe not, but at least use a reasonable test.  A test only for normality on this type of data, in my opinion, is not a reasonable comparison.  Their test is the equivalent to saying, "Well, Barry Bonds's batting average is closer to .500 than .000, so we can conclude that he is a .500 career hitter."  That kind of logic doesn't fly in my book.

I truly hope these authors don't think they are refuting the application CLT (I don't think they do, but the importance of infinite variance is that it won't apply).  If their implication is that "everything has infinite variance", then I guess the implication is that we can't run any statistical tests.  But they have not provided sufficient evidence for that here.  They did show that the raw data probably aren't normal, but any relatively informed person with an intro statistics course could have told you that, and this seems to be inappropriate for a good journal unless it is full of uninformed papers.

We can use R to show the CLT to be the case for the Poisson with the following (extremely simple) script.  All this does is take 1 million random Poisson (lambda = 2) draws and calculate the mean 5,000 times.  Note that we don't need 1 million draws, nor do we need 5,000 samples to show this.  But we have the computing power so why not.  Then we plot it with a histogram and qqplot to see if it looks normal.  The Shapiro-Wilk test is simply a formal way to test the normality (not a test I like to use much, but it exists so why not).

distPOIS 
<- NULL

for(i in 1:5000) {

    sampy <- rpois(10000, 2)

    distPOIS <- c(distPOIS, mean(sampy))

    }

 

par(mfrow=c(1,2))

hist(distPOIS)

qqnorm(distPOIS)

 

shapiro.test(distPOIS)
Created by Pretty R at inside-R.org


Of course, this assumes the data are Poisson.  Given the variance parameters they have for academic publication (in the tables), there seems to be some overdispersion in some areas and underdispersion in others.  However, they don't present an overall mean and variance for all publication, which by eyeballing looks like it could be pretty close to Poisson (mean=variance).

In the overdispersion case, we could use the negative binomial (or perhaps geometric) and rework our variable.  Of course, it is difficult to operationalize the likelihood of getting into a journal (and this is not the same for each person), number of attempts, etc., so that's why I stuck to Poisson here.

BUT, since they have the raw data, they can just sample from that anyway so we have no reason to bother assuming a distribution.  We simply need to know if it conforms to our statistical tests that are based on the CLT.

Issue #2: One thing that seems to be conflated here is the actual distribution of performance if all people were participating in a given profession, to that of observed performance of those actually in the profession.  This is a contention with many sabermetricians and the work of DeVany, if I remember correctly.

Anyway, it seems to me that this paper chose some additional biased samples to evaluate.  The distribution of talent itself in any given profession is not likely to be normally distributed, let alone the performance relative to those who selected in.  There is selection into that occupation based on ability, especially so in those highly compensated based on observable performance.  There is also a minimum wage, which keeps us from seeing the far, far left of the distribution in the U.S. even in the lowest skilled jobs.  Nonetheless, even if we could see this, experience has a way of morphing the distribution and job title tends to mean some jump to the next occupation.

We also have to remember there is a bare minimum in performance allowed before getting fired (related to the minimum wage).  If we have shirkers, or if there is little chance of promotion, economic theory would predict more employees to hang around doing just enough to get paid.  But they don't really choose these sorts of jobs (and explicitly state that they choose heavily performance-based pay jobs for this reason), so that's a minor quibble.


Issue #3 comes from the operationalization of their variables.  For example, using Academy Award Nominations has a number of problems.  This is similar to using the MVP to measure the distribution in talent in baseball (and these relate to Issue #1 directly).  These are rank-based.  Ranks are messy in this way.  We would have to expect some high random variation across acting performances for a "good" actor and "bad" actor to expect the former to be considered the "best" actor at any given point.  In other words, you could have a perfectly normal distribution of acting performances, and no error in individual performance (completely deterministic), and the same exact person will get every single Academy Award every year.  That seems like a strange way to test for normality.  The distribution of these awards is almost certainly not normal, and we don't need to resort to a power law test to know that.

Also, I'm willing to bet there is a momentum factor with Academy Award nominations, and winning an award puts that person in the eyes of the voters more often.  Therefore, all else equal, they are more likely to win the award again (my guess, though that's an empirical question).  In other words, each successive award is not independent of the other.  So this isn't a variable I would use to gauge performance in the first place.


Issue #4 is that they're using relative performance as a measure (touched on in #3).  This is an abstraction that, admittedly, could be off due to my limited expertise in the subject.  But it's not something we think about much, so I am open to comments on this.

In something like baseball, performance outcomes are invariably based on relative skill.  They are not piecemeal (but the Schmidt & Hunter (1983) paper they cite as part of their rationale actually does test piecemeal work!).  In this way, we can think of two variables.  The first is batter skill.  The second is pitcher skill.  These two skills are independent of one another.  The performance, however, is not independent of either of these skills.  We may be able to say that performance outcomes of batters are independent of other batters, so let's do that to simplify.

Even if we do, we cannot ignore the structure of the variable of measured baseball performance in MLB.  If we have two random variables of Batter Skill (X) and Pitcher Skill (Y) that we assume, innately, are normally distributed (and independent), then the observed outcome is not X, it is Z.

The problem with Z, if calculated as a ratio of two normal random variables for example (PLEASE SEE***), is that we don't know what the distribution might be (maybe Cauchy distributed, which have tendencies for outliers just based on how we operationalized it?).  But this is in measured outcomes--based on sample selection bias to boot--not in ability.  Perhaps some strange structure of Z is driving some of the result, but I'm not sure this is all that useful.

***Keep in mind this is an over-simplification of the performance measure.  It is likely something more complicated than Z = X/Y, which means it might be some other distribution.  But beyond what I have stated here, I don't have the expertise to comment.  And my interpretation here could also be incorrect.  The point is simply that, depending on how you define your performance variable, you could be creating something unwieldy.  Perhaps that is an important lesson, but not the one they try to get at in the paper.


Issue #5 is that with actors and actresses, the independent skill level is, again, not measured.  In fact, performance itself is not independent here.  Better actors/actresses are more likely to be paired with better writers and better directors.  When they are judged on their performance, there is an additive or multiplicative effect .  A great actor in a crappy written movie with a terrible director is much less likely to receive acclaim than a great actor in a masterfully written and well-directed movie.  So, you get this power distribution stemming from measure this outcome, not by measuring ability.  These high skill people tend to cluster together to make outcomes different from the skill distribution.  Lest we forget that there are lots of starving actor wannabe's that are probably terrible, when most of us decided long ago we wouldn't bother being an actor because we suck at it (again, selection bias here).  That is not to say that someone out there who is not an actor couldn't act better than the starving actor moonlighting as a bartender.  We just don't observe their acting performance, and they don't team up with other talented people in the biz.


Issue #6 they use EPL Yellow Cards as a measure for negative performance.  Those who are fans of soccer know that yellow cards can occur from strategic behavior.

They also use MLB career errors (by individual player) without accounting for play time as far as I can tell.  This is a big time "huh?" moment in my mind.  Even if the outcomes of this strange variable follow this distribution, it doesn't mean they're unexpectedly worse than everyone.  It means that they're awfully good at something else to keep them around to make those errors (i.e. an excellent hitter).  It is likely that many players could fill the "error void" in the distribution had they only been better hitters.  I haven't read in too much detail about all of the measures here, but this stuck out to me.

Issue #7: The authors explicitly note their "ambitious goal" to refute the idea that performance is not normal (assuming that claim is still up in the air to begin with).  But they proceed with showing that they have so much data that the ambitious goal is reachable just because there is so much of it.  But this is a fallacy many people make.  More data is generally very good to have.  But if you're not running the most useful tests on that data, then it may as well be small data.


I am sure there is more here, but I've used up enough time.  Seems to me that this is another attempt at a "sexy" paper, rather than one that actually tests the distribution of the data.  If they had done all this and at least tested against other possible distributions of the data, then I would probably say "interesting".  But the leap from "not normal" to "power law" is a tough one to swallow when there is nothing about the in-between. Certainly, z-scores (apparently their use in performance data) can be useful for non-normal distributions without infinite variance.  So why not make this clear?


Friday, January 4, 2013

Pitch Recognition and Neuroscience

My wife is a behavioral neuroscientist/biopsychologist (yes she is way smarter than me) and she ran across this neat paper that she forwarded my way.  I thought it would be of interest to those who still visit this website.  I will try to give some thoughts later, though I don't know much about neuroscience having only an undergraduate degree in psychology.