Monday, November 30, 2009

Arbitrage Failure: When the Sleeper Bubble Bursts

I have a new article up over at Fantasy Ball Junkie taking a look at risk, sleepers, and how we see similar re-evaluation of risk in arbitrage markets (for example, in merger arbitrage). The conclusion is the same basic conclusion most people come to when discussing sleepers: BE CAREFUL WHEN YOU REACH!

In general, it's a very basic extrapolation of a really neat presentation I attended by Columbia Business School Professor Daniel Beunza. Beunza investigates topics in the quantitative side of economic sociology and its interaction with technology. The basis for my extension to fantasy can be found at his site here. Given Dr. Beunza is way smarter than I am, I'm sure the article somewhat minces much of the intellectual content he provides in his work. However, the main issue is in tact when evaluating risk with fantasy baseball sleeper picks.

Below is a very simple picture of how risk assessment is changed using a collective "implied risk". The distributions don't have to be normal (and likely aren't when someone uses their biased judgement on a sleeper). The important thing to see is that much of the area under Curve 1 is ignored (to the left of Curve 3) when a sleeper is touted excessively.

All in all, I enjoy finding real-world corollaries to fantasy sports. Some of them can be a stretch, but often times they can be informative and support ideas presented about the structure and processes involved in the game.

Wednesday, November 25, 2009

Trying to Keep Up

It's been one of those weeks and the Holidays sure aren't helping get things done. I have much of a new post written up here...hopefully a bit less brash than the one that seemed to get a bunch of attention. Also, we should have a new Fantasy Ball Junkie post up soon. Here's a couple links:

Pizza Cutter describes how thinking about baseball and sabermetrics influenced how he thought about his academic work in Psychology. A fun story. My feeling is that using interesting topics, especially sports ones, would help the nitty gritty of statistical become less daunting for even the undergraduate level class. In fact, I remember a recent NSF Grant awarded to some professors studying just that.

Phil Birnbaum has more on aging over at Sabermetric Research Blog. I'm curious to hear the next post, as the aging question is an interesting one to me. Each of the methods I've seen used have bias problems--which are unavoidable for the most part--and I think it's still a fairly fruitful line of sports research (especially when thinking about heterogeneous aging across "types" of players).

The Sports Economist has an interesting link about negative intangible externalities from sports. It's an interesting take on the issue, though Rod Fort feels the article is a bit one-sided (comments section). I guess we should all remember that people are "reading and quoting".

That's all for now. Apparently I'm supposed to hang out with my family.

Tuesday, November 17, 2009

"Fantasy" Announcement

I mentioned the other day that I had a forthcoming announcement to make here. Well here it goes.

I'm now writing a column at The Fantasy Ball Junkie that incorporates simple economic thinking into fantasy sports, league rules and outcomes, along with other things that can cause problems when leagues are designed and run. The first post of "Weird Science" is up here. I essentially argue that the common auction format used in fantasy sports rewards craftiness, rather than preparation and evaluation. While I think both are fun, the basis for fantasy was originally to reward the top evaluator and roster builder. My main interest lies in the fact that FAAB auctions are run differently. There are a lot of interesting corrollaries in real-life auctions that I could go into, but may save that for another time. The article lays out the basic argument, and proposes something new for people to think about when designing their league.

We hope to have the column over there be semi-regular (weekly commitments are tough for me given my other obligations--a.k.a. slave duties to Michigan faculty). I think it will be fun. I'll be sure to link there at this site whenever a new column is up. I'd also suggest checking out the other content there. It's a great resource to have when it comes to strategy and there are some really smart people running the site.

ADDENDUM: Be sure to read the comments section, as there are two very important corrections to the article. The presentation of the ideas was unclear, and I made a mistake by not going through the editing process as carefully as I should have. They're easy fixes though, and are up in the comments section.

Monday, November 16, 2009

Balance in the NFL Going to Sh*t?

Over the past few weeks (and especially after the beatdown that New England laid upon the Titans) I've been hearing commentators, colleagues, and friends claim the NFL just isn't very interesting this year. The claim is that the competitive balance of the league, or the evenness between the teams, has taken a huge downturn and there must be something wrong with the league's construction. My first thought is that the NFL has long been generally thought of as the most balanced league of North America's Big 4 and any change it would see would most likely be downward (when you're at a peak, the only way to go is down). My second thought was that, yes, there do seem to be a large number of blowouts this season. But looking at the standings, I just don't see that clearly. There are only 4 teams with more than 6 wins thus far in the season, and yesterday the awful Redskins beat the not-so-long-ago-unbeaten Denver Broncos.

Now that we're at a point where everyone has played 9 games (except for the Browns and Ravens tonight, but for the sake of saying this year is unbalanced, I'm assuming the Ravens win) I decided I'd check to see if there has been a significant change in balance this season. I'll just use a simple Ratio of Standard Deviations (RSD--a measure popularized by Roger Noll and Gerald Scully). This measure is a ratio of the standard deviation of winning percentages (SD) divided by the "idealized" standard deviation in a perfectly balanced league (ISD). A perfectly balanced league is defined (as all teams having a record of 0.500)--this is a stupid mistake on my part as noted in the addendum below...especially since I work with this data a lot. The reason for the denominator is to control for the number of teams and number of games played in each season to make it comparable across sports or seasons in which these variables change. The lower the RSD, the better the balance. From 2001 to 2008, the RSD was as follows:

2001: 1.628
2002: 1.321
2003: 1.535
2004: 1.540
2005: 1.694
2006: 1.447
2007: 1.661
2008: 1.658

And at the 9-game point in the 2009 season, we see: 1.448

So, at this point in the season, it looks as though we're on the better end of the balance spectrum (if you believe that more balanced means a better league--which, of course, is always up for debate). Keep in mind that RSD is not the only way to measure balance. However, it works as an interesting quick check on the distribution of wins around the league. It very well could be that there is a huge gap in talent between the top and bottom teams, where the best teams pummel the worst. We've seen some of this, but we've also seen the Redskins beat the Broncos and the Raiders beat the Eagles.

To be honest, I enjoy a demolition every now and then. Watching the spectacle of Tom Brady throwing 6 TD passes in one quarter is a lot of fun for me. It would probably get boring if there weren't teams that could challenge the Pats, Colts and Saints, but I don't think there's any lack of competition in the NFL as the Saints last few games have shown. I'm not ready to conclude that the NFL system is now completely broken, so let's not rush to fix it.

ADDENDUM: Guy points out a problem with my explanation. The ideal league isn't one with all .500 teams, but one where each team has a 50% chance of beating any other team. He makes some points about the difficulty at the extremes for balance, which are good to keep in mind. The correction for the model in terms of season length can overcorrect for short seasons like 9 games, making the RSD look more balanced than the league actually is. Given this, I still don't think balance is a hugely significant problem in the NFL this season as some commentators have tried to point out.

Thanks to for the data on past NFL season RSD. You can find the data for this post, as well as numerous other sports business data files by clicking the link on the sidebar.

Friday, November 13, 2009

Another Link and an Interview

Today, I'm at home sick so I have a little time to sit here at the computer. I briefly mentioned a problem I have with the general "Sabermetric Community" when I posted about the argument on replacement players at The Book Blog (that actually stemmed from an unfounded accusation that a bunch of "stupid economists" wrote a flawed paper on Factor Analysis--not really an Econometric technique). However, I think JC Bradbury does a much better job than I in his interview at Chop n' Change.

Bradbury sums up my thoughts on interactions with this group of people pretty well. The general pattern on many sites is to simply ignore or misrepresent any sort of perceived conflicting view (even if that view isn't actually conflicting with anything). Now, I am not here to state that these people aren't intelligent, or that it doesn't happen to some extent on both sides of the issue. To the contrary, many of them are very smart people, but with an unfortunate arrogance that I don't understand. In my discussions with top sports economists, any inconvenient truth presented seems to simply be ignored or, as Bradbury puts it, "chastised without heeding the point."

An example is that of my previous post on discrimination in the NHL. While Phil Birnbaum claims that the book is making "premature accusations", the phenomenon of this discrimination has been documented and studied for more than 20 years in the sports economics literature (if that interests you, see the citations in my previous post). Despite mentioning these papers--supplemented by a sarcastic yet friendly post by sports economist Rodney Fort about making sure to be well read on a subject before heavily criticizing it--went unheard for the rest of the thread. The conversation continued as if this was truly a new problem.

This isn't an isolated incident. I recently read an article over at the Harvard Sports Analysis Collective that essentially looked at a time series of competitive balance. While I think this website is a great learning tool for Harvard students, it amazes me that their resident Harvard statistician allowed this article to be posted. There are a couple of reasons for this. The first is simply that taking the standard deviation of wins is problematic when comparing across years. The number of teams and games has changed dramatically over this time, making it very difficult to compare across seasons. In addition, the competitive balance change has been well-documented by Rodney Fort and Young Hoon Lee in a series of papers from 2005 to now (and probably will continue). That DOES NOT mean that further analysis is inappropriate. To the contrary, more inspection is needed. However, presenting work with no reference or understanding of the problems is troublesome. Finally, allowing these students to take others' work on the internet as a given isn't something we would want going on at an institution like Harvard. In fact, the last thing we want is for Harvard graduates and students to participate in what Bradbury calls a "groupthink attitude".

Finally, Bradbury mentions an article at The Book Blog that completely abuses a model developed by John Hakes and Skip Sauer. I had in fact read the article by Tango and was appalled at the misuse of the model myself. I began writing a response explaining the difficulty with extrapolating a regression outside of a sample, but decided it would simply fall upon deaf ears. At this point, I just don't bother. It seems that others that do not look upon Tango as some sort of cult leader have given up as well.

I am extremely excited about a forthcoming special issue of The Journal of Sports Economics that discusses many of these problems in depth by some of the most vocal, and most knowledgable, economists in the field of sport. Hopefully self-proclaimed "subject matter experts" will take some of the implications in the issue to heart. However, my expectation is that none of them will bother to read it.

The current state of The Book Blog reminds me of sitting in MBA economics classes at the Business School here at Michigan. Without understanding of how models are simplistic in order to explain expectations of markets, arrogant students consistently chastise the professor because of a single example they ran into at work. The professor's response is always, "Well, of course there are anomolies, but on average X happens" as if he was waiting for it. This statement just doesn't get through to people for some reason, despite its generality. The thing that blows my mind about this entire problem is that Sabermetricians believe they are critical thinkers. They are people who ran into others ignoring their opinions for years or criticizing their silly 'statistics' based on small sample sizes for years. Yet, the arrogance continues to blind minds to the fact that economics and Sabermetric study are so interrelated that ignoring the basic economic principles can be counterproductive in progressing the science.

While I continue to post things on this blog, I want to reiterate that what I post here IS NOT something that should be taken as science. Most of what I write is a general brain dump, or interesting tidbits and extensions using projects from my statistics classes. I hope to have a monthly disclaimer to ensure this is understood, and that fostering discussion and well-read arguments is part of my intention. My ideas on this site are not to start an online pissing match, or to out-do anyone else. Please see my Introduction as to how I think about the things I write. I try to write with the utmost care, but can make mistakes. I hope they are pointed out in a manner conducive to discussion.

So let's all take one from Rodney Fort's book, as he says, "Let's all READ MORE ."

ADDENDUM: Here's the Rosenthal article...where he supports the idea of sabermetrics and claims their findings have greatly enhanced our understanding...yet gets heavily criticized elsewhere on the internet.

Wednesday, November 11, 2009

Forthcoming News and a Fun Link

I have some interesting news in the works for my site that may redirect its focus (and it's 2 readers) somewhere else. It's been busy lately and keeping a site up myself has become difficult (at least with any meaningful material). I'll leave that for later, though.

Over at The Sports Economist, Brad Humphreys has an amusing article about the early season NCAA Basketball "tournament". Click here to go over there.

Friday, November 6, 2009

Better Models than RPI

The other day, when clicking through the Center for Statistical Consulting here at U of M, I came across one member of the group (a current PhD student here) who had a link to his site where he lists a large amount of research he's been involved in. His name is Brady West, and the reason I'm posting the link here is that he has a couple publications in Journal of Quantitative Analysis in Sport.

One study develops a model that, apparently, far outperforms that of the RPI system currenlty used in choosing the 64 (65) teams in the March Madness tournament for NCAA Basketball. West advocates that those in charge should use his model to pick the teams. After just a quick look over his data (and also knowing some of the problems with RPI), I would have to agree with him. Of course, that's leaving aside any anti-trusty type things that constantly go on with NCAA, the BCS, and probably March Madness.

His other study looks at predicting teams winning college bowl games. I think he's still working on that one, but the preliminary analysis published in JQAS is interesting. West seems to know his stuff (he's an alum of the graduate program in Statistics here as well) and it's nice to know there are statisticians at U of M that may be willing to work with sports and maybe at some point with the Sport Management Department.

Anyway, you can find his website here, which also includes his data and Excel files.

Tuesday, November 3, 2009

Quick Post on Curious Poll Results

I was jetting around the internet this morning checking up on the latest sports news and came across a poll at CBS Sports that asks the following question:

How will the World Series End?

A. Yankees in 6 (46%)

B. Yankees in 7 (11%)

C. Phillies in 7 (44%)

The thing that really interested me here was the breakdowns of voting. It was really intriguing to me that so few people thought the Yankees would win in 7 (I personally voted for Yankees in 6 with no allegience to either team).

At first glance, we might think, "Why do so many people think that if the Phillies win tomorrow night, they have a 4x larger chance of winning the whole thing than the Yankees?" But let's break this down a little differently and rephrase:

Who will win the World Series?

A. Yankees (57%)

B. Phillies (44%)

I know those don't add up to 100%, but it's just a rounding error. Anyway, it looks like the majority thinks the Yankees will win the World Series. That's expected, given where it's at right now. The problem with our first glance is this: we're not conditioning the Yankees win on having to play a Game 7, while the Phillies MUST play Game 7 to win.

What the original poll is telling us is not that people think the Phillies have 4x the odds of winning that the Yankees do, given there's a Game 7. If we re-run the poll after tomorrow night's game, assuming the Phillies won, we'd probably see something more like the second poll. Those that think the Yankees will win in 6 games will just vote for the Yankees in Game 7 (mostly). Why is this interesting?

Well, it relates back to the way that the Olympic voting took place and how the votes that are split between two choices become skewed when there is multi-part voting. Apparently, many voters had Chicago and Rio as their top 2 choices, but that actually was bad for Chicago. JC Bradbury has an interesting post on it here.

Anyway, nothing groundbreaking in my post. It's a pretty well-understood phenomenon. I hadn't posted anything for a while, and needed a brain dump. I just thought it was interesting, given my first impressions of the poll.

On another note, I'm fiddling with positional adjustments for the HOF Batters to see if I can improve my prediction model from before. I'm also working on an article discussing the optimal auction structure for fantasy baseball leauges. Those should be up here shortly.