<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-6731958142156026312</id><updated>2012-02-10T10:21:22.286-05:00</updated><category term='Prizes'/><category term='NHL'/><category term='Measurement Error'/><category term='Tennis'/><category term='Auctions'/><category term='Big 12'/><category term='Replacement Player'/><category term='Simulation'/><category term='BIS'/><category term='Economics'/><category term='Stadium'/><category term='Externalities'/><category term='Data Science'/><category term='Helmets'/><category term='sab-R-metrics'/><category term='Women'/><category term='Announcement'/><category term='Movie'/><category term='Sillyness'/><category term='NBA'/><category term='Story'/><category term='College'/><category term='Brewers'/><category term='GAM'/><category term='Charity'/><category term='Addiction'/><category term='PSL'/><category term='Guest Post'/><category term='Marketing'/><category term='Blogs'/><category term='Collective Bargaining'/><category term='MLB'/><category term='Fan Preference'/><category term='Fail'/><category term='Jose Lima'/><category term='Baseball Prospectus'/><category term='ESPN'/><category term='World Series'/><category term='Salary'/><category term='Catcher'/><category term='Data Issues'/><category term='Hall of Fame'/><category term='Bill James'/><category term='Fantays'/><category term='Rules'/><category term='Strategy'/><category term='Strike Zones'/><category term='Vacation'/><category term='Presentations'/><category term='Graphical Analysis'/><category term='Antitrust'/><category term='Wine Tasting'/><category term='Congratulations'/><category term='Basketball'/><category term='Color'/><category term='Baseball'/><category term='Fantasy Baseball'/><category term='Sport Management'/><category term='FBJ'/><category term='Footballl'/><category term='Non-Parametrics'/><category term='NFL'/><category term='Drafts'/><category term='Redskins'/><category term='Fielding'/><category term='MLB Draft'/><category term='Heat Maps'/><category term='JQAS'/><category term='Polls'/><category term='Education'/><category term='Incentives'/><category term='Subsidies'/><category term='Freakonomics'/><category term='R-project'/><category term='Media'/><category term='Baseball Cards'/><category term='Gambling'/><category term='Hockey'/><category term='Help'/><category term='Arbitrage'/><category term='NCAA'/><category term='Discrimination'/><category term='Loess'/><category term='Technology'/><category term='Academic'/><category term='Sabermetrics'/><category term='Collectibles'/><category term='Pitch F/X'/><category term='Clutch'/><category term='Michigan'/><category term='Statistics'/><category term='VGAM'/><category term='Stadiums'/><category term='Neyer'/><category term='Random Forest'/><category term='Philosophy'/><category term='Libertarianism'/><category term='Thanks'/><category term='Democracy'/><category term='Regression'/><category term='Competitions'/><category term='Programming'/><category term='Finance'/><category term='Fanatsy Football'/><category term='Politics'/><category term='JSM'/><category term='Soccer'/><category term='Tournaments'/><category term='Self-Indulgence'/><category term='Steroids'/><category term='Crosspost'/><category term='Umpires'/><category term='General'/><category term='League Rules'/><category term='Conference'/><category term='Food'/><category term='Links'/><category term='BBWAA'/><category term='Demand'/><category term='Link'/><category term='Realignment'/><category term='Attendance'/><category term='Law'/><category term='Animation'/><category term='Networks'/><category term='R-Project Links'/><category term='Strasburg'/><category term='Parity'/><category term='Olympics'/><category term='Arguments'/><category term='Internet'/><category term='Marlins'/><category term='Predictions'/><category term='Draft'/><category term='Visualizations'/><category term='Real Life'/><category term='Kernel Smoothing'/><category term='Fun'/><category term='Blogging'/><category term='Terrible'/><category term='Interdisciplinary'/><category term='Business'/><category term='RSN'/><category term='Tickets'/><category term='Myths'/><category term='Fantasy'/><category term='Black History'/><category term='Data'/><category term='Friday'/><category term='Competitive Balance'/><category term='Autographs'/><category term='Validation'/><category term='Valuation'/><category term='Little League'/><category term='Update'/><category term='Attribution'/><category term='Hot Stove'/><category term='Stupidity'/><category term='Keeper'/><category term='Non-Sports'/><category term='Sports'/><category term='Football'/><category term='Books'/><title type='text'>The Prince of Slides</title><subtitle type='html'>Sports. Statistics.  Economics.  And a naive graduate student pretending to know more than nothing about something.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><link rel='next' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default?start-index=101&amp;max-results=100'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>173</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-5022200879912412415</id><published>2012-02-10T10:10:00.006-05:00</published><updated>2012-02-10T10:21:22.294-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Hall of Fame'/><category scheme='http://www.blogger.com/atom/ns#' term='Sabermetrics'/><category scheme='http://www.blogger.com/atom/ns#' term='JQAS'/><category scheme='http://www.blogger.com/atom/ns#' term='Bill James'/><category scheme='http://www.blogger.com/atom/ns#' term='Arguments'/><title type='text'>Bill James on Dwight Evans</title><content type='html'>&lt;a href="http://www.grantland.com/story/_/id/7555836/an-open-letter-mlb-hall-fame-dwight-evans-rightful-place-cooperstown"&gt;Bill James has an article up at Grantland&lt;/a&gt; lobbying for Dwight Evans being voted into the Hall of Fame.  He's right, and not just from a sabermetric standpoint.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://sitemaker.umich.edu/millsbrian/files/published_jqas_version__mills_and_salaga__2011_.pdf"&gt;My paper at JQAS with Steve Salaga&lt;/a&gt;&lt;a href="http://sitemaker.umich.edu/millsbrian/files/published_jqas_version__mills_and_salaga__2011_.pdf"&gt; &lt;/a&gt;argues exactly the same point (actually for both Darrell and Dwight Evans, but Darrell isn't eligible any longer).  However, we take a non-saber look at things and use the traditional statistics that we all know BBWAA voters love.  Of course, we run a rather complex technique, but we still use these basic statistics.  What do we find?&lt;br /&gt;&lt;br /&gt;Well, &lt;span style="font-weight: bold;"&gt;even by the BBWAA's own criteria&lt;/span&gt; (based on their past voting behavior), Dwight Evans should be voted in.  When we compare him to others in our analysis, he comes out ahead of Mark McGwire (sans-steroids, he's a sure thing), Barry Larkin (fielding is a weak point of our study), Joe Carter, and Dave Parker.  And just behind Mike Piazza.&lt;br /&gt;&lt;br /&gt;That means that--given how voters have behaved before--Evans should be in.  If voters' own preferences toward traditional statistics say he should be in (without even accounting for Evans' great fielding!).  And if Bill James's sabermetric thoughts think he should be in.  Then why the hell isn't he in?&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Hat Tip: Tango&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-5022200879912412415?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/5022200879912412415/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2012/02/bill-james-on-dwight-evans.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/5022200879912412415'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/5022200879912412415'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2012/02/bill-james-on-dwight-evans.html' title='Bill James on Dwight Evans'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-6582770963275010210</id><published>2011-12-01T12:17:00.002-05:00</published><updated>2011-12-01T12:21:37.600-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Link'/><category scheme='http://www.blogger.com/atom/ns#' term='Books'/><category scheme='http://www.blogger.com/atom/ns#' term='Baseball'/><category scheme='http://www.blogger.com/atom/ns#' term='Umpires'/><title type='text'>A Book on Umpire Performance</title><content type='html'>&lt;a href="http://books.google.com/books?id=Gyd0dYw0GIMC&amp;amp;pg=PA36&amp;amp;lpg=PA36&amp;amp;dq=Major+League+Umpires%27+Performance,+2007-2010:+A+Comprehensive+Statistical+Review&amp;amp;source=bl&amp;amp;ots=vn_A_bm9BV&amp;amp;sig=44CBKVptqJ6fUvI0r8Ax1YkVzOU&amp;amp;hl=en&amp;amp;ei=4rXXTqfBMIyRgQfkmtWFDw&amp;amp;sa=X&amp;amp;oi=book_result&amp;amp;ct=result&amp;amp;resnum=5&amp;amp;ved=0CEAQ6AEwBA#v=onepage&amp;amp;q&amp;amp;f=false"&gt;I ran across this today when trying to find a paper on Google scholar&lt;/a&gt;.  While it sounds interesting, there is no use of pitch location data as far as I can tell.  Mostly, it seems to report ball and strike percentages and ultimate game outcomes, with a profile for each umpire.  Certainly interesting to see, though I provided much of this for free in one of my previous posts (with pitch location information included).  For those, see &lt;a href="http://princeofslides.blogspot.com/2011/04/umpire-call-database.html"&gt;HERE&lt;/a&gt; and &lt;a href="http://princeofslides.blogspot.com/2011/03/umpire-strike-zones.html"&gt;HERE&lt;/a&gt; and &lt;a href="http://princeofslides.blogspot.com/2011/03/umpire-strike-zones-in-2010.html"&gt;HERE&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Anyway, thought some visitors may be interested.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-6582770963275010210?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/6582770963275010210/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/12/book-on-umpire-performance.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/6582770963275010210'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/6582770963275010210'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/12/book-on-umpire-performance.html' title='A Book on Umpire Performance'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-2811967246908106814</id><published>2011-11-30T15:25:00.013-05:00</published><updated>2011-12-02T09:23:56.174-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Collective Bargaining'/><category scheme='http://www.blogger.com/atom/ns#' term='Economics'/><category scheme='http://www.blogger.com/atom/ns#' term='MLB'/><category scheme='http://www.blogger.com/atom/ns#' term='Baseball'/><title type='text'>New MLB CBA: Owners win!</title><content type='html'>I haven't had much to say about the MLB CBA that was recently agreed upon.  Really, the whole thing seems a bit silly, and I'm not totally sure how some of the things will play out.  Whatever the result, I don't think it's nearly as much as some seem to be howling about.  What should be howled about is the screw job on the young players, not so much the implications for professional baseball.&lt;br /&gt;&lt;br /&gt;It will probably give slightly more incentive for players to go to college, I guess.  But that only depends on the return to go to college, as they'll be subject to negotiations with only a single team next time their drafted.  If so, then so what?  College players generally make it to MLB a bit quicker, so they may still arrive at the same time (or just slightly later) they would have otherwise.  But ultimately, those players will end up in MLB.  How many multi-sport players are we really talking about?  I can't imagine this is significant at all to the total talent in the league.  Here's a few thoughts.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;International Players&lt;/span&gt;&lt;br /&gt;The idea that large market teams don't invest heavily in international players, and that this cap will only affect the small market teams seems ridiculous to me.  If anything, I think the cap gives smaller teams more incentive to invest in international talent and training.&lt;br /&gt;&lt;br /&gt;Before, teams really had to worry about investing heavily in training only to watch international players sign with a team that had more money.  Now, the return to training is higher (less bonuses) and the probability that the player will sign with the team that trained them is higher (market has less huge offers).&lt;br /&gt;&lt;br /&gt;This could have the effect of redistributing talent across all teams, certainly.  But this, just like before, is a free agent market for these players.  Yes, with more uncertainty and less leverage for the players, but a competitive market nonetheless.  Before making a final judgment here, I would have to see the distribution of international spending by team.  I suspect it is not highly negatively correlated with market size, meaning the effect is just cheaper talent.&lt;br /&gt;&lt;br /&gt;This restriction probably helps all owners, unless it creates a situation in which big market teams decide to only target 'sure thing' players, and allocate all of their cap to a single player trained by another team.  Not sure how reasonable of a worry that is, but I'd be interested in someone enlightening me on the data here.  Again, the loser here is the international players and the bonuses they're paid to feed their families.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Slotting in the Draft&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;First, I think it's a bit silly that if a pick isn't be signed, those dollars can't go toward signing another player.  This is pretty classic "screw the guys with no representation" at work.  The owners' interest in the slotting is pretty obvious: they get cheap talent even cheaper.  The veterans, I think, were a bit misguided on this one.  They probably assume that the money not going to the draft picks will be reallocated to them in the free agent market.  As Lee Corso would say, "Not so fast my friend!"  &lt;span style="font-style: italic;"&gt;(As a note, this thought was sparked by a question from Sky Kalkman on Twitter, so I'm going to try and lay it out fully here.)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Outside of the impact on the likelihood of signing picks, the slotting really has no consequences in the free agent market (if they don't sign them, they might want to spend some money to replace this talent they would have otherwise had).  Assuming that the large majority of draft picks are signed (they really don't have a more lucrative option, do they?), the teams are getting the same amount of talent they had before, but for less money.  So there's a large surplus in the draft.  Surplus has to go somewhere right!?!&lt;br /&gt;&lt;br /&gt;Right!...into the owners' pockets (or into another, more lucrative investment).  Major League Baseball has no requirement on the percentage of revenues that must go toward MLB salaries, outside of the minimum salary requirements to individual players.  Even if it did, I'm not sure that this would cover rookie signing bonuses.  But let's assume it did have this, and it included both draft bonuses and salaries.  Then teams would have to make up that spending somewhere else.  Depending on the stipulations of the minimums on payroll and bonuses, it very well could go into the free agent market.  But this isn't the case.  The teams have the same talent they did before, but for cheaper.  Would buying more talent make sense?&lt;br /&gt;&lt;br /&gt;It could, under certain conditions.  But I don't think these hold.  First, it would require that their marginal revenue be above the marginal cost.  In other words, teams spend up to some point where these two are equal.  But they're already doing this, under the assumption of profit maximization.&lt;br /&gt;&lt;br /&gt;Unless the marginal revenue for an additional win increases or the &lt;span style="font-weight: bold;"&gt;marginal &lt;/span&gt;cost decreases, there isn't an incentive to spend more.  Yes, the total cost (average cost of a win) decreased for them due to the slotting on the whole.  But, in the competitive market for free agents, there is no reason to believe that the cost for one more win has gone down (in fact, if you believe more money is being reallocated there, it would increase!).  As for marginal revenue, why would that increase?  If the CBA increased interest in the sport as a whole, then maybe it would a tiny bit.  But I doubt that's the case.&lt;br /&gt;&lt;br /&gt;Here's an example.  Let's say the Rays spent $15 million on rookies last year.  At first glance, a veteran might say "Hey, this year they only have to spend $2 million.  Then they'll spend the other $13 million on us! Woohoo!"&lt;br /&gt;&lt;br /&gt;But--taking into account the uncertainty of draft and veteran talent in the future (let's just talk in expected values) and signing all picks--they have the same talent they did last year.  To buy more talent, they have to enter the competitive free agent market.  Last year, let's say an additional expected win for Tampa would increase revenues by $3.9 million (no reason to think this changed from last year due to the new rules).  Similarly, the market price for a free agent is at $4 million (and why would this decrease now?).  It isn't rational to spend $4 million for the additional $3.9 million in revenues.  Sure, they could get from 88 to 89 expected wins for cheaper than they could last year.  But it would decrease overall profits to reallocate that money under these new rules.  Therefore, it's not likely that they'll go into the free agent market and spend that money by choice.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Now, there is an exception to this. &lt;/span&gt; If under collective bargaining, the agreement was to slot draft picks only if the minimum salary was increased, then some goes to the players.  It is bargaining after all, and we can't assume that all the vets are stupid.  In fact, the minimum MLB and 40-man roster minimum salaries were increased by about 16% each.  But this minimum still does not go into calculating the marginal cost of an additional win.  In, say, a WAR equation you just add it to the total for each player: Salary = $414,000*1.16 + $B*WAR.  Therefore, it's a new fixed cost of operation in MLB (a new intercept for the equation).  The amount you pay for marginal wins is independent of the minimum salary in this case.  The total additional cost to owners is ($480,000-$414,000) * 25 players * 30 teams = $49,500,000 plus the 40-man increase ($78,250-$67,300)*15*30 = $4,927,500.  Include guys getting sent up and down, and we'll put it at a cool $55 million.&lt;br /&gt;&lt;br /&gt;According to Jim Callis of Baseball America, teams spent $192 million on bonuses in the first 10 Rounds of the 2010 MLB draft (I think I'm reporting this correctly).  That means that the owners are left with a surplus of $192 - $55 = $137 million total.  So, here both the owners (+$137 million) and players (+$55 million) gain from the new agreement.  Other than the agreed upon increase in minimum salary, there isn't any reason to believe that owners will reallocate this money saved in the draft to free agents.&lt;br /&gt;&lt;br /&gt;Certainly these results change under revenue or win maximization.  But for North American sports, profit maximization is usually assumed to best describe owners.&lt;br /&gt;&lt;br /&gt;As always, I welcome (and enjoy) any thoughts and criticisms or clarifications about things I've misunderstood here.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;ADDENDUM: I realized that I didn't subtract from the $192 the allocation that still goes to the draft picks.  Woops!  Should be same conclusion, just less share than $137 million for the owners.  Assuming they spend $2 million on average, they'd be looking at more like a $77 million surplus in their pockets.  At $3 million, you're at $47 million, and so on.  Previously, they were at a $6.4 million average total per team in the first 10 Rounds.  Not sure what the exact slotting will be, but if we assume it cuts this in half, then we're near the $50 million mark and the players and owners may have split this up near 50-50.  This likely means that, in terms of percentage increase, the lower-level players (Craig Counsells of the world) are sitting pretty.&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-2811967246908106814?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/2811967246908106814/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/11/new-mlb-cba-owners-win.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/2811967246908106814'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/2811967246908106814'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/11/new-mlb-cba-owners-win.html' title='New MLB CBA: Owners win!'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-4442408275286178125</id><published>2011-11-28T12:43:00.004-05:00</published><updated>2011-11-28T13:11:21.716-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='PSL'/><category scheme='http://www.blogger.com/atom/ns#' term='Economics'/><category scheme='http://www.blogger.com/atom/ns#' term='Links'/><category scheme='http://www.blogger.com/atom/ns#' term='Fanatsy Football'/><category scheme='http://www.blogger.com/atom/ns#' term='Freakonomics'/><title type='text'>Soared in Value?  Probably Not As Much as Indicated</title><content type='html'>&lt;a href="http://www.freakonomics.com/2011/11/28/finally-an-investment-worth-making/#comment-275122"&gt;Freakonomics reports&lt;/a&gt; on this &lt;a href="http://www.post-gazette.com/pg/11331/1192861-53.stm"&gt;Pittsburgh Post-Gazette &lt;/a&gt;article claiming that Steelers personal seat licenses have increased by as much at 1,400 percent.  While I won't deny that they have probably increased in value, this is an overstatement.  Here's why.&lt;br /&gt;&lt;br /&gt;They compare the current secondary market prices to those posted by the Steelers in 2001.  The first mistake here is not understanding that most teams price to sell out (in the inelastic portion of demand).  This is also likely true with PSLs.  There are a few reasons for this, one being that they maximize profits not just on ticket sales, but also on concessions and memorabilia within the stadium.  Secondly, there is also some economic theory that places like sports and restaurants keep prices low, as their product depends on other people also liking it and consuming it (and they have a fixed supply).  There could be evidence of lashback if the team gets lots of public funding as well, and then turns around and soaks up lots of consumer surplus with PSLs.  They want to keep some good will, though this is hard to show in practice.  Whatever the reason, it invalidates the direct comparison of Steelers' prices for PSLs and the prices on the secondary market.&lt;br /&gt;&lt;br /&gt;In 2001, when the Steelers sold their PSLs for Heinz, it is VERY likely that the market would have paid significantly more than what they were going for.  One could verify this by looking at 2001 sales of PSLs on the secondary market.  While this still made them a great investment for the savvy ticket seller, it means that the demand for PSLs likely did not increase fourteen-fold in the past 10 years.  One would have to compare the secondary market for these then to the secondary market for them now.&lt;br /&gt;&lt;br /&gt;The second thing they didn't do (at least they make no indication of it) was adjust for inflation.  This again makes the numeric comparison invalid.  2011 dollars are worth less than 2001 dollars.  Let's use 2010 dollars as an example.  They're worth about 81% of what 2001 dollars were.  So you have to first discount this amount before making a real comparison of value or changes in demand.&lt;br /&gt;&lt;br /&gt;All in all, it's likely that the Steelers have seen some great financial return on their success over the past 10 years.  However, it's just not nearly as large as the Pittsburgh Post-Gazette seems to want to indicate.  I guess those two losses to the Ravens this year left them looking for something positive to report on (*wink* *wink*).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-4442408275286178125?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/4442408275286178125/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/11/soared-in-value-probably-not-as-much-as.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/4442408275286178125'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/4442408275286178125'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/11/soared-in-value-probably-not-as-much-as.html' title='Soared in Value?  Probably Not As Much as Indicated'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-3162286753800920922</id><published>2011-10-29T11:17:00.019-04:00</published><updated>2011-10-31T12:08:03.745-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Color'/><category scheme='http://www.blogger.com/atom/ns#' term='Non-Parametrics'/><category scheme='http://www.blogger.com/atom/ns#' term='Sabermetrics'/><category scheme='http://www.blogger.com/atom/ns#' term='Statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='Baseball'/><category scheme='http://www.blogger.com/atom/ns#' term='Heat Maps'/><title type='text'>Maximizing Sabermetric Visual Content: Smooth Comparisons and Leveraging Color</title><content type='html'>A recent post by Mike Fast got me thinking a bit more about color.  For most, thoughts about color generally become a secondary interest.  But I am here to tell you they should be a primary concern in your statistical presentations.  This is especially true when analyzing the strike zone.&lt;br /&gt;&lt;br /&gt;Before you begin reading this, &lt;span style="font-weight: bold;"&gt;please read&lt;/span&gt; &lt;a href="http://www.baseballprospectus.com/article.php?articleid=15363"&gt;Mike's excellent post over at Baseball Prospectus&lt;/a&gt;.  Then, go ahead and read &lt;a href="http://praiseball.wordpress.com/2011/03/24/i-wish-dave-allen-was-never-born/"&gt;this article at Praiseball Bospectus&lt;/a&gt; (linked not because of its title--I am very glad Dave Allen was born--but because it really does highlight some issues with things you'll find around the net).&lt;br /&gt;&lt;br /&gt;Okay, now that you have read that, here are my additional comments.  First, heat maps should be approached with caution.  This is true whether or not you are smoothing or simply breaking up the zone into smaller areas.  Mike covers this well, but I will take it a little further with smoothing.&lt;br /&gt;&lt;br /&gt;When you use a smoothing technique, you really need to understand what it is doing.  I'm not going to fully describe loess techniques (or smoothing splines, or kernel density functions, etc.).  There are plenty of resources online.  Often times the degree of smoothing is up to the researcher.  However, it is almost always the case in baseball analysis that we want to compare one smoothed representation or heat map to another one.  This is where things get tricky.  You'll need to make sure you are not oversmoothing (too wiggly) or undersmoothing (not wiggly enough).&lt;br /&gt;&lt;br /&gt;Sample size is of course the first issue.  If you are going to present BABIP by pitch location for a single batter or pitcher, you are likely going to need to regress the data a lot.  Pitch data are extremely noisy.  Secondly, you really need to account for the fact that the batter CHOSE to swing at those pitches.  The pitches that a batter swings at is a distribution nested within all pitches thrown.  Then, the pitches that are contacted with are yet another subset of this distribution.  Sometimes, this is no big deal.  Other times, it is an extremely big deal.&lt;br /&gt;&lt;br /&gt;Let's ignore the second issue above and focus first on sample size and comparisons.  Next, let's restrict ourselves to evaluating the likelihood of a hit on contacted pitches.  Let's say one batter has made contact with about 1600 pitches, while another has made contact with about 250 pitches in our sample.  Let's just ignore the 'regression to the mean' issue here as well.  You know what, let's make it the same batter in both cases, with the second being a random sample of the first bunch.  If we use exactly the same smoothing parameters for each (with no restriction for the distribution being binomial, which technically it should be--&lt;a href="http://princeofslides.blogspot.com/2010/12/rethinking-loess-for-binomial-response.html"&gt;more thoughts on this issue here&lt;/a&gt;) we will get the following (extremely rough, and somewhat ugly--keep in mind I am not regressing here, just showing what happens with the different sample sizes) comparison below:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://3.bp.blogspot.com/-1WOmTAyNhuU/Tqwzv1N1e_I/AAAAAAAAAYE/SiFzoiXZXtc/s1600/BABIPall.png"&gt;&lt;img style="float: left; margin: 0pt 10px 10px 0pt; cursor: pointer; width: 356px; height: 400px;" src="http://3.bp.blogspot.com/-1WOmTAyNhuU/Tqwzv1N1e_I/AAAAAAAAAYE/SiFzoiXZXtc/s400/BABIPall.png" alt="" id="BLOGGER_PHOTO_ID_5668962927784590322" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a href="http://4.bp.blogspot.com/-1tRgkUJzNlI/Tqwz7NVx_yI/AAAAAAAAAYQ/MZR2XoC7QQo/s1600/BABIPsub.png"&gt;&lt;img style="float: right; margin: 0pt 0pt 10px 10px; cursor: pointer; width: 356px; height: 400px;" src="http://4.bp.blogspot.com/-1tRgkUJzNlI/Tqwz7NVx_yI/AAAAAAAAAYQ/MZR2XoC7QQo/s400/BABIPsub.png" alt="" id="BLOGGER_PHOTO_ID_5668963123238928162" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Because I have not restricted the data to be between 0 and 1, just assume the white splotches are where the probability of a hit on a ball in play is 0% (i.e. white==really really cold zone--I will leave aside the VERY IMPORTANT issue of ensuring the same color scaling on the sidebar for another post!).  You can see above that, even though we're looking at the same player, the maps are very different.  There are likely many problems here, as we would not expect pitches low and down the middle (remember, this is all within the strike zone) to be almost a 0% chance of a hit.  Why?  Well, the player above is Albert Pujols.  Plus, when we look at the full data on balls in play, we see that the probability is closer to what we would expect (though, according to this data, still a cold zone).&lt;br /&gt;&lt;br /&gt;You can also see that one plot shows a hot zone on the outer half, while the subset shows hot zones up and in as well as at the bottom of the zone.  This is a result of having very little data in these areas, and it is ultimately overweighted with the given smoothing parameter.  If Pujols gets one hit out of two pitches at the knees, it reports his BABIP to be .500 if we do not smooth enough or weight it along with other player data.  Of course, we wouldn't expect him to have a .500 BABIP in the future on these pitches.  Throw him 1000 pitches there to swing at, and he is really not likely to get 500 hits.&lt;br /&gt;&lt;br /&gt;So, with the same smoothing parameters, these plots really are not comparable to one another.&lt;br /&gt;&lt;br /&gt;Now, we could reconsider the smoothing parameter for the smaller data set (probably a good idea!).  However, the problem is that we don't know at what point of smoothing we're overfitting or underfitting.  You can imagine the problem is much more difficult when we are comparing two players against one another.&lt;br /&gt;&lt;br /&gt;One way to attack this issue is through using a generalized cross-validation technique (this can be done with the "mgcv" package in R).  Using this method, I have found that we need a large sample size of pitches.  The method really breaks down for the small subset; however, it allows not only for a binomial representation of the data (rather than smoothing it with an assumed Gaussian distribution), but also to optimize the smoothing parameter to compare across different sample sizes and distributions of pitches and BABIP.&lt;br /&gt;&lt;br /&gt;Okay, I could go on for a loooooong time and get really "mathy" with the considerations I mention above.  However, I'll just point everyone toward the book on Generalized Additive Modeling by Simon Wood (2006).  It is honestly one of the best resources I have ever come across in statistics, but to implement this with Pitch F/X you generally need a pretty large data set.  One needs to be careful and be sure to fully understand all of the options that can be implemented.  Using this method with the right type of data, you can ultimately create something like this (a strike zone map):&lt;br /&gt;&lt;br /&gt;&lt;a href="http://1.bp.blogspot.com/-tdgAOe7be6g/Tqw0sxpLoxI/AAAAAAAAAYc/hkaLUe5kMu0/s1600/OthersHeatMap.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 356px; height: 400px;" src="http://1.bp.blogspot.com/-tdgAOe7be6g/Tqw0sxpLoxI/AAAAAAAAAYc/hkaLUe5kMu0/s400/OthersHeatMap.png" alt="" id="BLOGGER_PHOTO_ID_5668963974797566738" border="0" /&gt;&lt;/a&gt;Before I get too far off on a tangent, let's return to the initial point of this post: COLOR.  For this, I'll stick with strike zone maps.&lt;br /&gt;&lt;br /&gt;The first question is: Why in the hell would we want to use color anyway?&lt;br /&gt;The answer is: It can help to communicate muddy scatterplots more easily.&lt;br /&gt;&lt;br /&gt;For example, below we have three scatterplots: Called Balls, Called Strikes, and the two combined.  It is easy to tell where the definite strikes and definite balls are, but when we overlap the two plots, the strike call likelihood becomes nearly nearly uninterpretable at the edges.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://1.bp.blogspot.com/-OPQHEO3u7PI/Tqw8tSTlYHI/AAAAAAAAAZA/xO72kjiBaaI/s1600/scatters.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 178px;" src="http://1.bp.blogspot.com/-OPQHEO3u7PI/Tqw8tSTlYHI/AAAAAAAAAZA/xO72kjiBaaI/s400/scatters.png" alt="" id="BLOGGER_PHOTO_ID_5668972779658371186" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;There is another consideration--noted by J-Doug--is color blindness.  The Green-to-Red plots for BABIP are likely a poor choice (as are the scatter plots shown above!).  Many people (about 8% of males) are unable to discern greens and reds.  So using these within the same image is a bad idea.  One way to evaluate your colors is to see if they are interpretable in black and white.  Let's check out the strike zone plot in black and white:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://2.bp.blogspot.com/-jVPloIWAU60/Tqw4berq_pI/AAAAAAAAAYo/QJuzWPYLcjk/s1600/blackandwhite.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 356px; height: 400px;" src="http://2.bp.blogspot.com/-jVPloIWAU60/Tqw4berq_pI/AAAAAAAAAYo/QJuzWPYLcjk/s400/blackandwhite.png" alt="" id="BLOGGER_PHOTO_ID_5668968075696471698" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;With the colors I use, it seems that for someone with complete color blindness, I have failed this test.  However, with some knowledge of a strike zone, this person would be able to understand that the dark within the zone is high strike probability, while the dark outside the zone is low strike probability.  They are also able to find that spot where the strike probabilities are changing the most (but this likely isn't satisfactory).  Which brings me to my next consideration...&lt;br /&gt;&lt;br /&gt;Color is an important factor of your visual depending on &lt;span style="font-weight: bold; font-style: italic;"&gt;what you want to highlight&lt;/span&gt; to the reader.  In the first heat maps, we may want to be able to read the smoothing across the strike zone at a very granular level.  However, for the strike zone map above, we may be more interested in the place where the likelihood of a pitch being called a ball becomes higher than the likelihood of the pitch being called a strike (here, the yellowish-whitish band).&lt;br /&gt;&lt;br /&gt;When considering interest in the gradual changes across a heatmap, I find it a good idea to use a single color.  This way, there is not this "breakpoint" from red-to-blue or from green-to-yellow.  The same color gets lighter and lighter as you go.  Below I have an example of using the same color for determining densities of called strike locations (i.e. where they are thrown least or most).  Here, I use a "red-to-white" palette and then switch it to "white-to-red".&lt;br /&gt;&lt;br /&gt;&lt;a href="http://3.bp.blogspot.com/-XUhWGJmuSa8/Tqw9p3WJbQI/AAAAAAAAAZU/wPfX1nBdVPw/s1600/rectMethod.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 400px;" src="http://3.bp.blogspot.com/-XUhWGJmuSa8/Tqw9p3WJbQI/AAAAAAAAAZU/wPfX1nBdVPw/s400/rectMethod.png" alt="" id="BLOGGER_PHOTO_ID_5668973820393385218" border="0" /&gt;&lt;/a&gt;&lt;a href="http://4.bp.blogspot.com/-DTKu0fHCUHI/Tqw9ppavSHI/AAAAAAAAAZM/LV5GpsQ8CwA/s1600/strikelocation.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 400px;" src="http://4.bp.blogspot.com/-DTKu0fHCUHI/Tqw9ppavSHI/AAAAAAAAAZM/LV5GpsQ8CwA/s400/strikelocation.png" alt="" id="BLOGGER_PHOTO_ID_5668973816654547058" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;I would love to get comments on others' opinions about and experiences with using color.  I have not gone too in-depth, but I hope to follow up with a number of examples of color use for the same image and how this can allow highlighting certain areas of a visual.  Also, with feedback, we can try and develop a consensus on what the optimal choices are for the majority.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-3162286753800920922?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/3162286753800920922/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/10/maximizing-sabermetric-visual-content.html#comment-form' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/3162286753800920922'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/3162286753800920922'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/10/maximizing-sabermetric-visual-content.html' title='Maximizing Sabermetric Visual Content: Smooth Comparisons and Leveraging Color'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-1WOmTAyNhuU/Tqwzv1N1e_I/AAAAAAAAAYE/SiFzoiXZXtc/s72-c/BABIPall.png' height='72' width='72'/><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-4374383911438695134</id><published>2011-10-27T21:01:00.004-04:00</published><updated>2011-10-27T21:26:31.186-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Hall of Fame'/><category scheme='http://www.blogger.com/atom/ns#' term='Statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='Academic'/><category scheme='http://www.blogger.com/atom/ns#' term='Baseball'/><title type='text'>Article at JQAS: Baseball Hall of Fame Voting</title><content type='html'>The newest issue of the &lt;a href="http://www.bepress.com/jqas/"&gt;Journal of Quantitative Analysis in Sports&lt;/a&gt; has been published today, and it features a number of interesting articles.  In honor of shameless self-promotion, I would like to highlight the following article:&lt;br /&gt;&lt;p&gt;&lt;a href="http://www.bepress.com/jqas/vol7/iss4/12" title="Using Tree Ensembles to Analyze National Baseball Hall of Fame Voting Patterns:  An Application to Discrimination in BBWAA Voting"&gt;Using  Tree Ensembles to Analyze National Baseball Hall of Fame Voting  Patterns:  An Application to Discrimination in BBWAA Voting&lt;/a&gt;&lt;br /&gt;&lt;span class="auth"&gt;Brian M. Mills and Steven Salaga&lt;/span&gt;&lt;/p&gt;The link above should be un-gated.  If it is not, please let me know and I can share the article.  This is my first, first-author academic publication so go easy on me (and Steve).  If you read my recent post about our&lt;a href="http://sitemaker.umich.edu/millsbrian/files/hockey_hall_of_fame_poster_presentation.pdf"&gt; joint poster at the 2011 Joint Statistical Meetings in Miami&lt;/a&gt;, this analysis should sound rather familiar.  Please place questions or feedback in the comments if you have them, or feel free to shoot me an email.&lt;br /&gt;&lt;br /&gt;We began this work a while back actually as a class project, and decided to turn it into an academic paper with some guidance and encouragement from our adviser.  A paper using the same technique came out last year (Frieman, 2010) which gave us a chance to add to this work by including pitcher predictions and extending the work to the economic literature on discrimination in Hall of Fame voting.  Our work differs somewhat from Frieman, and this is explained within the paper.  In fact, a (very) preliminary version of the work was on this website a while back; however, after the Frieman paper was published, I was worried a bit about getting scooped even more so (no foul play there--just happened to be doing a very similar analysis at the same time).&lt;br /&gt;&lt;br /&gt;Of course, R was used exclusively for the analysis.  Also, you may note that some familiar names are cited.  These include Cy Morong, Bill James, Jayson Stark, Peter Gammons, Tom Verducci, Chris Jaffe and, yes, Tom Tango (related to the Tim Raines site, of course).&lt;br /&gt;&lt;br /&gt;If you have crtiticisms, please present them respectfully and keep in mind that we don't think this analysis (or ANY analysis) is the last word on any issue.  And also keep in mind future predictions are only based on statistics as of 2009 (without career projections).  So they predict future induction under the assumption of retirement after the 2009 season.  But it was a lot of fun and it shows some promising results for the using technique in sports prediction.  There is a lot of Hall of Fame voting literature out there, and this is another addition to it.  Hopefully we can have a comprehensive model of hockey players soon now, too.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-4374383911438695134?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/4374383911438695134/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/10/article-at-jqas-baseball-hall-of-fame.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/4374383911438695134'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/4374383911438695134'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/10/article-at-jqas-baseball-hall-of-fame.html' title='Article at JQAS: Baseball Hall of Fame Voting'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-2748082902418853344</id><published>2011-10-25T10:11:00.004-04:00</published><updated>2011-10-25T10:26:21.327-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Programming'/><category scheme='http://www.blogger.com/atom/ns#' term='Sabermetrics'/><category scheme='http://www.blogger.com/atom/ns#' term='R-project'/><category scheme='http://www.blogger.com/atom/ns#' term='Statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='sab-R-metrics'/><category scheme='http://www.blogger.com/atom/ns#' term='Links'/><title type='text'>Sabermetrics Meets R Meetup</title><content type='html'>I just ran across &lt;a href="http://bigcomputing.blogspot.com/2011/10/some-great-r-users-meetups-are-coming.html"&gt;this post&lt;/a&gt; at Big Computing.  On November 14th, there will be an R User meet-up in Washington, DC (Tyson's Corner) led by Mike Driscoll about using R for sabermetric analysis (&lt;a href="http://www.meetup.com/R-users-DC/events/37862562/"&gt;linked here&lt;/a&gt;).  I will actually be home in Maryland for a couple weeks, and likely in DC on that Monday so there's a good chance that I will try and stop by this meet-up.  If anyone else is in the area and would like to come by, let me know.  I always enjoy meeting fellow statistics/sports dorks.  I imagine this will be a great extension to the tutorials that I have had here, coming from someone with much more expertise in statistics and statistical programming than I.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Hat Tip: Kirk Mettler&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-2748082902418853344?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/2748082902418853344/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/10/sabermetrics-meets-r-meetup.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/2748082902418853344'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/2748082902418853344'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/10/sabermetrics-meets-r-meetup.html' title='Sabermetrics Meets R Meetup'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-151791233100685163</id><published>2011-10-13T11:21:00.004-04:00</published><updated>2011-10-13T11:49:43.592-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Basketball'/><category scheme='http://www.blogger.com/atom/ns#' term='Sillyness'/><category scheme='http://www.blogger.com/atom/ns#' term='NCAA'/><category scheme='http://www.blogger.com/atom/ns#' term='Realignment'/><title type='text'>Insane Musings on Realignment</title><content type='html'>I back in Maryland this past weekend for a wedding and visiting my fiancee's family.  Her father is a massive G-Town fan, graduate and has been on the admissions board and academic advisory committee there.  He drives 2 hours each way to go to all of the basketball games.  I get blasted for not screaming and cheering when I go.  But it's all in good fun.&lt;br /&gt;&lt;br /&gt;He's disgusted by the inability of the Big East to hang on to it's big name schools in recent years, and worries that Georgetown is going to have difficulty recruiting without the big name FBS schools in the conference.&lt;br /&gt;&lt;br /&gt;This got me thinking: football is definitely a big winner, but there are a lot of basketball fans out there, too.  Smaller alumni bases make it difficult to estimate a television contract, but I would not be surprised to see basketball-only schools (and perhaps Notre Dame non-football) realigning to form their own national basketball mid-major powerhouse conference.  There are endless possibilities, but I see the following fitting together nicely in a conference like this (or, you could also just realign to a Catholic basketball conference with many of them):&lt;br /&gt;&lt;br /&gt;Georgetown&lt;br /&gt;Notre Dame&lt;br /&gt;Villanova&lt;br /&gt;St. Johns&lt;br /&gt;Providence&lt;br /&gt;Gonzaga&lt;br /&gt;Depaul&lt;br /&gt;Xavier&lt;br /&gt;Marquette&lt;br /&gt;Butler&lt;br /&gt;St. Mary's&lt;br /&gt;Temple&lt;br /&gt;Duquesne&lt;br /&gt;Old Dominion&lt;br /&gt;Creighton&lt;br /&gt;Memphis&lt;br /&gt;&lt;br /&gt;And possibly:&lt;br /&gt;Davidson&lt;br /&gt;Seton Hall&lt;br /&gt;George Washington&lt;br /&gt;George Mason&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Richmond (at the suggestion of Brian in the comments)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Obviously, this depends on whether or not schools like UConn, Louisville and West Virginia have enough clout to pull in significant conference revenue on the basketball side (perhaps Basketball and Football get some kind of package deal for the conference?).  But I wouldn't be surprised to see something like this happen.  Realigning so that there is still high quality competition within the conference could help all schools there recruit.  Notre Dame would likely be joining a BCS level conference.  Georgetown would obviously be the big wild card here on whether or not something like this happens and they may have a lot of pride, not wanting to stray from the BCS type schools.  I really don't know.  I think it would be fun to watch, though.&lt;br /&gt;&lt;br /&gt;Then again, maybe (probably) it's a silly idea.  What say you?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-151791233100685163?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/151791233100685163/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/10/insane-musings-on-realignment.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/151791233100685163'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/151791233100685163'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/10/insane-musings-on-realignment.html' title='Insane Musings on Realignment'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-2849575519464148751</id><published>2011-09-29T12:03:00.006-04:00</published><updated>2011-09-29T12:27:19.276-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Sabermetrics'/><category scheme='http://www.blogger.com/atom/ns#' term='Data Science'/><category scheme='http://www.blogger.com/atom/ns#' term='Links'/><title type='text'>Crediting the Rise of "Data Science" to Sabermetrics</title><content type='html'>As a graduate student in Sport Management, Statistics and Economics I am quite interested in the emerging "Data Scientist" profession.  My current skills in programming are mostly limited to statistical programming in R, Stata and SPSS (I am trying to begin dabbling in SAS and Matlab more), I wish I had more skills with Python, C, SQL, Perl, Access and the like in order to scrape data myself more efficiently.  I can do some basic SQL queries and read Perl script to understand *what* it's doing, but starting from scratch with these things would require a bit more free time than I have at this point in time.&lt;br /&gt;&lt;br /&gt;I could really become more efficient in my R programming (something I continue to work on) and given the popularity of SAS outside of academia, it would be good to get familiar with advanced programming here.  Unfortunately, I have never had a formal computer programming class.  Most of the statistical programming has come from my own fiddling and learning statistics in classes here at Michigan.  Don't get me wrong. I think I have a relatively unique  and useful skill set, but there's always lots to learn and there are many other places exhibiting skills that I just don't have.  And definitions of "data scientist" often include significant database management ability.  I have some skills here, but they are not anywhere near those of a formally trained computer scientist or IT/data architect.&lt;br /&gt;&lt;br /&gt;Anyway, the point of this post is to redirect readers to &lt;a href="https://docs.google.com/present/view?id=0AXaXKp9bt6OXZGd4YzlnYmRfNThjMmo4dm5yaA&amp;amp;hl=en_US"&gt;this presentation by Harlan Harris&lt;/a&gt; who talks about what "data science" really is.  Why link it here?  Well on the final page, Harris says the following:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;"Sabermetrics was a trigger for widespread growth. Demonstrated wider  applicability of stats methods, and drew attention from business."&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;A pretty strong quote, and one that I do agree with in some sense.  Interestingly, sports have been one of the slowest to adapt to these changes in technology and ability to get into data.  Harris suggests here, I think, that other businesses caught onto sabermetrics before those that the analysis was directed toward did.  Pretty interesting stuff!  I think the combination of open source programming and rise of blogging was the real culprit here.  However, sabermetrics provided talented people with a way to apply data science to something fun and interesting.  In this sense, it made it easy to communicate stories about the usefulness of data analysis in everyday business decisions.&lt;br /&gt;&lt;br /&gt;So here's my question to those doing analysis with sports data: Would you consider yourself a "data scientist"?  And if so, do you feel that full-on "hacking" skills are required to consider oneself as such?  Certainly they're a plus, but can two heads (a stat-based person and a Perl-to-SQL scraper) come together and both be data scientists?  Leave me something in the comments if you'd like!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-2849575519464148751?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/2849575519464148751/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/09/crediting-rise-of-data-science-to.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/2849575519464148751'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/2849575519464148751'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/09/crediting-rise-of-data-science-to.html' title='Crediting the Rise of &quot;Data Science&quot; to Sabermetrics'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-189001403865082663</id><published>2011-09-23T11:52:00.003-04:00</published><updated>2011-09-23T12:11:54.455-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Sports'/><category scheme='http://www.blogger.com/atom/ns#' term='Link'/><category scheme='http://www.blogger.com/atom/ns#' term='Economics'/><category scheme='http://www.blogger.com/atom/ns#' term='Academic'/><title type='text'>IJSF Sports Economics Research Rankings</title><content type='html'>&lt;a href="http://ideas.repec.org/a/jsf/intjsf/v6y2011i3p222-244.html"&gt;A recent paper by Jose Manuel Sanchez Santosand Pablo Castellanos Garcia in the International Journal of Sport Finance&lt;/a&gt; puts forth rankings of Sports Economics papers and Sports Economists.  They create an index for this ranking (please refer to the paper if you're interested).  Of course, there are lots of familiar names on there, but what I wanted to highlight here was the dominance (in a self-interested light, of course) of the University of Michigan Sport Management Program in the field of Sports Economics, Sport Finance and Development.  Based on the rankings, we have the #1 (Stefan Szymanski), #3 (Rodney Fort), #27 (Mark Rosentraub) and #57 (Jason Winfree) academic sports economists in the world.  They are all within the department.  Quite a powerhouse we have here :-)&lt;br /&gt;&lt;br /&gt;The University of Alberta comes in with Brad Humphreys (#4) and Dan Mason (#7), but they are technically in different departments there.  I have had the pleasure of meeting Dr. Mason as well as another ranked economist in the paper, Joel Maxcy (who is now at Temple).  I am happy to say that I have had some email contact with both Young Hoon Lee (who has helped me immensely in the econometrics programming in my dissertation) as well as JC Bradbury.&lt;br /&gt;&lt;br /&gt;Other familiar names abound on the list, and I look forward to meeting #21 Andrew Zimablist in November when he comes to speak about Title IX.  These rankings are always a fun exercise, but aren't necessarily any sort of end all at the 'best' researchers out there.  However, I think there is little doubt that this is a headquarters for sports economics.  Each of the professors listed above are very different, which gives us great diversity as well. &lt;br /&gt;&lt;br /&gt;I have benefitted immensely from the structure of the department here at Michigan (as well as other departments).  Much of this was luck, as I arrived at the right time when serious evolution of the faculty and program was taking place.  There is no doubt that--for the quantitatively and economically inclined sport fan--this is the place to be.  For those interested in other aspects of Sport Management, we have some pretty powerful faculty as well.  It's really been quite a thrill to bump elbows with many of those on this list, and it's been an honor to study here in the department for going on 5 years!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-189001403865082663?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/189001403865082663/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/09/ijsf-sports-economics-research-rankings.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/189001403865082663'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/189001403865082663'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/09/ijsf-sports-economics-research-rankings.html' title='IJSF Sports Economics Research Rankings'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-3531283497736869129</id><published>2011-09-09T16:00:00.006-04:00</published><updated>2011-09-09T16:13:52.033-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Fun'/><category scheme='http://www.blogger.com/atom/ns#' term='Fail'/><category scheme='http://www.blogger.com/atom/ns#' term='Baseball'/><category scheme='http://www.blogger.com/atom/ns#' term='Autographs'/><title type='text'>Fail Post: Failure in Baseball Knowledge</title><content type='html'>A couple weeks ago on the plane back to Ann Arbor, I decided to open up Sky Mall and found the following:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://3.bp.blogspot.com/-gO6vv85VkXA/Tmpw2ajVS2I/AAAAAAAAAXo/iIoAwa2t1Ec/s1600/Steiner%2BAd.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 299px; height: 400px;" src="http://3.bp.blogspot.com/-gO6vv85VkXA/Tmpw2ajVS2I/AAAAAAAAAXo/iIoAwa2t1Ec/s400/Steiner%2BAd.jpg" alt="" id="BLOGGER_PHOTO_ID_5650452762632473442" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;I actually laughed out loud on the plane.  Let's treat this as a Highlights Magazine game where you circle all the things wrong with this picture.  You'd think that a well-known company like Steiner could do a little more research before putting this joke of an ad in a magazine.&lt;br /&gt;&lt;br /&gt;Let's begin with the heading for this area: Future Stars.  No complaints about Troy Tulowitzki, and Austin Jackson is reasonable.  But Tulo isn't a start of the future, he's a star now.  Chase Headley pushes the limit of naming someone a "Future Superstar".  But I could live with that.&lt;br /&gt;&lt;br /&gt;Answer key below:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://2.bp.blogspot.com/-mJ7hzD1GYYQ/TmpzMeTU3UI/AAAAAAAAAXw/uMh-xGuO50s/s1600/Steiner%2BAd.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 299px; height: 400px;" src="http://2.bp.blogspot.com/-mJ7hzD1GYYQ/TmpzMeTU3UI/AAAAAAAAAXw/uMh-xGuO50s/s400/Steiner%2BAd.jpg" alt="" id="BLOGGER_PHOTO_ID_5650455340619455810" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Rick Porcello, Tigers Ace?  Nearly 37 year-old RA Dickey a future star?  Jeff Francouer, future star and ultimate clutch hitter?  Hmmmm.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-3531283497736869129?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/3531283497736869129/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/09/fail-post-failure-in-baseball-knowledge.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/3531283497736869129'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/3531283497736869129'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/09/fail-post-failure-in-baseball-knowledge.html' title='Fail Post: Failure in Baseball Knowledge'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-gO6vv85VkXA/Tmpw2ajVS2I/AAAAAAAAAXo/iIoAwa2t1Ec/s72-c/Steiner%2BAd.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-6887326362670246075</id><published>2011-09-07T13:37:00.003-04:00</published><updated>2011-09-07T13:46:42.966-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='R-project'/><category scheme='http://www.blogger.com/atom/ns#' term='VGAM'/><category scheme='http://www.blogger.com/atom/ns#' term='Statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='Soccer'/><category scheme='http://www.blogger.com/atom/ns#' term='Links'/><title type='text'>Link to StatDNA Guest Post</title><content type='html'>The post is officially up on the StatDNA blog.  &lt;a href="http://blog.statdna.com/post/2011/09/07/Creating-Wins-Research-competition-Guest-Post-1.aspx"&gt;Go check it out&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;As I said in my previous post, this is a very rough and preliminary model.  This is why my work was not any sort of formal entry, just some fun with some great data.&lt;br /&gt;&lt;br /&gt;I used an Vector Generalized Additive Proportional Odds Model to evaluate the change in win probability for each event listed in the StatDNA data, given the spatial location and time left in the game (as well as the score).  Things turned out pretty well for this rough version and the WPA rankings are pretty close to what the EA Sports Index reports at the EPL website.  Because I haven't finished the model, I won't release all of the players' WPA from last year.  However, I do mention that players expected to be near the top of the list are there.&lt;br /&gt;&lt;br /&gt;The most interesting players to me were Wayne Rooney--who finished lower than one might expect--and the up and coming goalie Tim Krul.  Given that I'm more of a baseball guy, I was pretty happy with the way these things turned out.  A lot of people love Krul, and this analysis seems to support that love.&lt;br /&gt;&lt;br /&gt;Anyway, go check it out over there.  Below are some fun visualizations which you may find similar to my umpire heat maps or Fangraphs Win Expectancy graphs (&lt;a href="http://blog.statdna.com/post/2011/09/07/Creating-Wins-Research-competition-Guest-Post-1.aspx"&gt;which you'll find at the link as well&lt;/a&gt;).  All in all it was a lot of fun, and I'd like to thank StatDNA for letting me get dirty with the data.  If you are interested in soccer, I'd definitely suggest checking them out!&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://2.bp.blogspot.com/-adzmlispKc8/TmethhOSiRI/AAAAAAAAAXQ/EId7C1dKTyE/s1600/EventActivityandShots.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 282px;" src="http://2.bp.blogspot.com/-adzmlispKc8/TmethhOSiRI/AAAAAAAAAXQ/EId7C1dKTyE/s400/EventActivityandShots.png" alt="" id="BLOGGER_PHOTO_ID_5649675048925169938" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://3.bp.blogspot.com/-bRY1WAQvfF0/Tmeth8UOAwI/AAAAAAAAAXY/tkvtWTS5oW8/s1600/Chelsea%2Bx%2BSunderland%2BWinExpChart.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 200px;" src="http://3.bp.blogspot.com/-bRY1WAQvfF0/Tmeth8UOAwI/AAAAAAAAAXY/tkvtWTS5oW8/s400/Chelsea%2Bx%2BSunderland%2BWinExpChart.png" alt="" id="BLOGGER_PHOTO_ID_5649675056197796610" border="0" /&gt;&lt;/a&gt;&lt;a href="http://2.bp.blogspot.com/-adzmlispKc8/TmethhOSiRI/AAAAAAAAAXQ/EId7C1dKTyE/s1600/EventActivityandShots.png"&gt;&lt;br /&gt;&lt;/a&gt;&lt;a href="http://1.bp.blogspot.com/-PiBcUJfUm_I/Tmeth_Y0L8I/AAAAAAAAAXg/-oW5didvmMY/s1600/Arsenal%2Bx%2BTottenham%2BHotspur%2BWinExpChart.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 200px;" src="http://1.bp.blogspot.com/-PiBcUJfUm_I/Tmeth_Y0L8I/AAAAAAAAAXg/-oW5didvmMY/s400/Arsenal%2Bx%2BTottenham%2BHotspur%2BWinExpChart.png" alt="" id="BLOGGER_PHOTO_ID_5649675057022382018" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-6887326362670246075?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/6887326362670246075/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/09/link-to-statdna-guest-post.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/6887326362670246075'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/6887326362670246075'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/09/link-to-statdna-guest-post.html' title='Link to StatDNA Guest Post'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-adzmlispKc8/TmethhOSiRI/AAAAAAAAAXQ/EId7C1dKTyE/s72-c/EventActivityandShots.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-3756475915644079451</id><published>2011-09-01T13:35:00.004-04:00</published><updated>2011-09-01T13:44:44.906-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Competitions'/><category scheme='http://www.blogger.com/atom/ns#' term='Statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='Guest Post'/><category scheme='http://www.blogger.com/atom/ns#' term='Soccer'/><category scheme='http://www.blogger.com/atom/ns#' term='Links'/><title type='text'>Forthcoming Guest Post at StatDNA</title><content type='html'>For those few of you that frequent this blog, you've probably noticed a scarce amount of posting lately.  I've been working on a number of things, including finishing my dissertation.  My adviser tells me I need to learn how to say "No" when people ask me about working on new projects, but as of yet I have not learned this well enough.  Unfortunately, this has meant saying "No" a bit more to blogging.&lt;br /&gt;&lt;br /&gt;Nevertheless, one of the projects I was working on had to do with the &lt;a href="http://blog.statdna.com/post/2011/03/04/statdna-soccer-analytics-research-competition.aspx"&gt;StatDNA competition advertised here&lt;/a&gt;.  Dave Allen and I had planned on having some fun and putting some things together (along with some possible guidance from Soccernomics author and new Michigan Sport Management arrival, Stefan Szymanski), but alas all of us were a bit crunched on time.&lt;br /&gt;&lt;br /&gt;Because of that, I wrote up a more simple blog post on some fiddling I had been doing with the StatDNA data (which is pretty awesome).  While it did not qualify as a contest entry, the StatDNA blog will be posting it up along with the contest entrants.  I'll wait for them to post, but as a preview it is the beginning of developing a sort of Wins Created metric while accounting for the spatial location of events in the game.&lt;br /&gt;&lt;br /&gt;There is still &lt;span style="font-weight: bold;"&gt;much &lt;/span&gt;work to do--and this was only the preliminary model--but I found it a lot of fun and &lt;a href="http://blog.statdna.com/post/2011/09/01/StatDNA-research-competition-winner-announced.aspx"&gt;Jaeson Rosenfeld found it interesting enough to include on the blog&lt;/a&gt;.  Once it is officially posted, I will be sure to link things here.  Congratulations to the winner, Sarah Rudd, and her paper titled "Modeling Possessions in Soccer Using Markov Chains"...a paper that is likely way over my head.  I look forward to reading it, though!&lt;span style="font-size: small; font-family: arial, helvetica, sans-serif;"&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-3756475915644079451?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/3756475915644079451/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/09/forthcoming-guest-post-at-statdna.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/3756475915644079451'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/3756475915644079451'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/09/forthcoming-guest-post-at-statdna.html' title='Forthcoming Guest Post at StatDNA'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-5599112087537644588</id><published>2011-08-09T16:42:00.006-04:00</published><updated>2011-08-09T17:05:41.078-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Hall of Fame'/><category scheme='http://www.blogger.com/atom/ns#' term='Random Forest'/><category scheme='http://www.blogger.com/atom/ns#' term='Hockey'/><title type='text'>Clarifications About the JSM Poster</title><content type='html'>David Smith--from Revolutions--referred me to a &lt;a href="http://www.reddit.com/r/hockey/comments/jdu2x/statistical_evidence_that_theo_fleury_alexander/"&gt;criticism at Reddit&lt;/a&gt; regarding the poster my fellow grad student and I presented at JSM last week.  This comment made me want to clarify what we are attempting in the analysis.&lt;br /&gt;&lt;br /&gt;We &lt;span style="font-weight: bold;"&gt;ARE NOT&lt;/span&gt; attempting to find the &lt;span style="font-style: italic;"&gt;most deserving&lt;/span&gt; players or the &lt;span style="font-style: italic;"&gt;best &lt;/span&gt;players.  We are attempting to use &lt;span style="font-style: italic;"&gt;simple &lt;/span&gt;statistics to model the voting behavior and decision rules of those making the induction decisions.  Many involved in baseball would argue that WAR is the best measure of overall player performance.  I'd likely agree.  But how many BBWAA voters make inductions based on that statistic (at least prior to, say, 2005)?&lt;br /&gt;&lt;br /&gt;This is the idea we are presenting here: Hall of Fame voters are simplistic in nature when it comes to their voting.  That doesn't mean they won't change, but it means that they will vote based on the information they have available.  This likely includes Goals and Assists.  We include Plus-Minus, but find it to be essentially useless in classification, which is probably a good thing: it shows that our model is making the decision rules correctly for this metric.&lt;br /&gt;&lt;br /&gt;Now, I do think the thought about normalizing things like goals and assists is a valid one.  It is something we are working on, but in baseball have generally found that aggregate milestones are most predictive of Hall induction.  For example, using ERA+ did not improve upon the model with ERA.  I'm not saying that it's the best way to go, but it seems to be the way the decision rules are made.  I will double check this version of the model for hockey, of course.&lt;br /&gt;&lt;br /&gt;Lastly, there was concern over including All-Star games in the analysis.  Because there are other reasons for voting a player into the Hall--for example "integrity" is used specifically in the baseball induction requirements--the ASG totals are included in order to control for the popularity and general well-liked-ness (is that a word) of a player.  We do not include it simply because we think it's a great measure of the best players.  And there is certainly noise when it comes to ASG participation.  The same goes for Stanley Cup Wins.  But a player like Phil Rizzuto almost surely was inducted into the baseball HOF thanks to his appearance on so many World Series teams.  It seems that some players are voted in  based on their prominence in the media and on good teams.  Again, I make no judgement as to whether or not that's the correct way to go.&lt;br /&gt;&lt;br /&gt;I hope this clears up any confusion.  Hopefully we will have a working version of the paper out in the coming months.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-5599112087537644588?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/5599112087537644588/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/08/before-calling-someone-idiot-please.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/5599112087537644588'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/5599112087537644588'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/08/before-calling-someone-idiot-please.html' title='Clarifications About the JSM Poster'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-5857629251277807240</id><published>2011-08-08T16:11:00.004-04:00</published><updated>2011-08-08T16:18:53.442-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='NHL'/><category scheme='http://www.blogger.com/atom/ns#' term='Hockey'/><category scheme='http://www.blogger.com/atom/ns#' term='Attendance'/><category scheme='http://www.blogger.com/atom/ns#' term='Data'/><category scheme='http://www.blogger.com/atom/ns#' term='Help'/><title type='text'>Request for Data (NHL Attendance)</title><content type='html'>This is a pleading, begging request for some help in collection of some data.  I am working on a project looking at franchise-level hockey attendance for a chapter of my dissertation but for the life of me can't find certain years for certain teams.  If anyone has the data below, I would be forever grateful to have your assistance.  I need s&lt;span style="font-weight: bold;"&gt;eason-level attendance data by franchise&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;I will even give you a mention in the acknowledgements of my dissertation so that you can live forever in print version in the dusty U of M Kinesiology dissertation library!&lt;br /&gt;&lt;br /&gt;Anyway, below is what is needed.  If you have anything, please let me know (bmmillsy AT umich DOT edu):&lt;br /&gt;&lt;br /&gt;Boston Bruins: 1967-1971&lt;br /&gt;&lt;br /&gt;Chicago Blackhawks: 1967-1972 and 1975-1983&lt;br /&gt;&lt;br /&gt;Montreal Canadiens: 1967-1972 and 1975 to 1988&lt;br /&gt;&lt;br /&gt;New York Rangers: 1967-1972 and 1975-1988&lt;br /&gt;&lt;br /&gt;Toronto Maple Leafs: 1967-1972 and 1975-1987&lt;br /&gt;&lt;br /&gt;And if you happen to run across it, any attendance data from before 1963, but that's not totally necessary (just always nice to have extra data).  If anyone knows WHY these data are missing from just about everywhere possible, I'd also be interested in hearing that.&lt;br /&gt;&lt;br /&gt;Thanks!&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-5857629251277807240?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/5857629251277807240/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/08/request-for-data-nhl-attendance.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/5857629251277807240'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/5857629251277807240'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/08/request-for-data-nhl-attendance.html' title='Request for Data (NHL Attendance)'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-1918174310769526970</id><published>2011-08-05T10:51:00.010-04:00</published><updated>2011-08-05T12:11:13.930-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Hall of Fame'/><category scheme='http://www.blogger.com/atom/ns#' term='Thanks'/><category scheme='http://www.blogger.com/atom/ns#' term='R-project'/><category scheme='http://www.blogger.com/atom/ns#' term='Random Forest'/><category scheme='http://www.blogger.com/atom/ns#' term='Hockey'/><category scheme='http://www.blogger.com/atom/ns#' term='Conference'/><title type='text'>More on JSM</title><content type='html'>While my time at the 2011 Joint Statistical Meetings was short--I unfortunately missed some presentations I would have like to have attended--it was a great experience.  The collection of academics and professionals is very different from the other conferences that I have attended (like Sport Management and Tourism conferences) and the interest in the methods themselves at JSM really forced me to be on my toes.&lt;br /&gt;&lt;br /&gt;While there, I got the chance to put some faces with the names I have seen around the blogosphere.  It was a pleasure to meet both Phil Birnbuam--of S&lt;a href="http://sabermetricresearch.blogspot.com/"&gt;abermetric Research Blog&lt;/a&gt;--and David Smith--VP of Revolution Analytics Marketing and author of the&lt;a href="http://blog.revolutionanalytics.com/"&gt; Revolutions Blog&lt;/a&gt;.  David asked about sharing my poster (joint with fellow graduate student, Steve Salaga) investigating Hockey Hall of Fame Induction using the R package "randomForest".  While 'machine learning' can sound intimidating to some, Random Forests are actually quite  a simple method for bootstrapping classification trees and allowing for random variable  selection and a hold-out sample for each tree so that  over-fitting is kept to a minimum.  And what better way to implement it than with sports data!?!&lt;br /&gt;&lt;br /&gt;**********&lt;br /&gt;As a side note, this is not the first time we have implemented randomForest for sports data.  Steve and I have a forthcoming paper in the Journal of Quantitative Analysis in Sports identifying patters in BBWAA voting for the Baseball Hall of Fame.  Our paper is similar to a recent work by Frieman (2011) in the same journal, but we add pitchers and a discussion of exclusions based on race.  As a whole, it seems that there does not seem to be any negative effect of being a minority when it comes to BBWAA voting--at least according to the method we use.&lt;br /&gt;**********&lt;br /&gt;&lt;br /&gt;So back to the Hockey Hall of Fame.  For both this poster and the baseball paper, it is important to note that we are not attempting to gauge who &lt;span style="font-style: italic;"&gt;should &lt;/span&gt;be in the Hall of Fame based on their performance as a player.  Rather, we are attempting to gauge how well each player aligns with the &lt;span style="font-style: italic;"&gt;views &lt;/span&gt;of the Hall of Fame Voting Committee and whether or not they were 'snubbed' &lt;span style="font-weight: bold; font-style: italic;"&gt;based on how the committee would be predicted to vote&lt;/span&gt;.  If the committee is terrible at gauging the best players, then our model will be as well.  We are simply interested in the voting behavior and committee preferences, and not who the best players really are.  This is an important distinction in attempting to find any exclusions based on qualitative variables like race or language, rather than attempting to rank the best players in the game.&lt;br /&gt;&lt;br /&gt;We only include simple statistics--as we predict committee members to focus on these mostly--and goalies are not included in the analysis.  Unfortunately, statistics for goalies are few and far between and the NHL has not kept Save Percentage for long enough to include in any worthwhile prediction model for goalies.  Therefore, only skaters are included.  We separate forwards and defensemen, but the only significant difference is the importance of Assists (they're higher for defensemen).&lt;br /&gt;&lt;br /&gt;For example, classifying baseball player inductions on WAR or Win Shares gives us who probably should be the guys in the Hall based on their on-field performance.  However, BBWAA voters do not necessarily use this metric when voting.  Therefore, we want to train our data to what BBWAA voters &lt;span style="font-style: italic;"&gt;do &lt;/span&gt;pay attention to.  The same goes for hockey.  The most important statistics for classifying players are what you would expect, and they are also presented using the Random Forest's "Variable Importance" metric.&lt;br /&gt;&lt;br /&gt;This also allowed us to qualitatively evaluate the decision rule boundaries built by the forest and assess the possibility of certain players being discriminated against based on language.  There is a line of (conflicting) economic literature--mostly in the 1980s and 1990s--that has made claims of language-based discrimination in the labor market for hockey, so we found the Hall of Fame voting to be another good test of this.  Long story short, however, there does not seem to be anything systematic going on.  But we leave that up to the reader, as we present each of the players near the boundaries of the decision rules from the forest.&lt;br /&gt;&lt;br /&gt;For those interested in the full analysis, you'll have to wait for the paper.  As always, there are further considerations for this sort of investigation, not the least of which include testing the RF algorithm against other classification techniques (like neural networks, discriminant analysis, simple classification trees, and others).  We'll have to address those as well as other great comments from those that stopped by at the conference.  However, a detailed summary of the current version is in &lt;a href="http://sitemaker.umich.edu/millsbrian/files/hockey_hall_of_fame_poster_presentation.pdf"&gt;THIS POSTER&lt;/a&gt; that we presented at JSM.&lt;br /&gt;&lt;br /&gt;Thanks to all of those who stopped by.  The conference was a great experience and I hope to return next year!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-1918174310769526970?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/1918174310769526970/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/08/more-on-jsm.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/1918174310769526970'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/1918174310769526970'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/08/more-on-jsm.html' title='More on JSM'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-5584803758844036452</id><published>2011-07-29T09:30:00.003-04:00</published><updated>2011-07-29T09:37:41.113-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Hall of Fame'/><category scheme='http://www.blogger.com/atom/ns#' term='Statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='Hockey'/><category scheme='http://www.blogger.com/atom/ns#' term='Conference'/><category scheme='http://www.blogger.com/atom/ns#' term='JSM'/><category scheme='http://www.blogger.com/atom/ns#' term='Links'/><title type='text'>Joint Statistical Meetings in Miami</title><content type='html'>I am headed off to Miami for the 2011 Joint Statistical Meetings on Sunday.  I'll have a poster to present with a fellow graduate student and look forward to experiencing a new conference with a very different bunch than I normally interact with professionally (though, closer to those I interact with online).  If you're going to be attending, stop by the &lt;a href="http://www.amstat.org/meetings/jsm/2011/onlineprogram/AbstractDetails.cfm?abstractid=302190"&gt;Section in Sports Contributed Poster Session and see our poster&lt;/a&gt;.  The poster investigates Hockey Hall of Fame voting patterns (skaters only) and the possibility of language-based bias.  Long story short is that we don't find much, but there is more to do and that does not necessarily mean nothing is happening.&lt;br /&gt;&lt;br /&gt;While the meetings are for statistics in all disciplines, there is a lot on sports there. &lt;a href="http://www.amstat.org/meetings/jsm/2011/onlineprogram/AbstractDetails.cfm?abstractid=303468"&gt; Phil Birnbaum will be presenting some of his findings with respect to race and strike calling&lt;/a&gt; (and there is an additional poster on the topic) and &lt;a href="http://www.amstat.org/meetings/jsm/2011/onlineprogram/AbstractDetails.cfm?abstractid=303439"&gt;Shane Jensen will be giving a roundtable talk on fielding metrics&lt;/a&gt;.  Check out the full sports program here.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-5584803758844036452?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/5584803758844036452/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/07/joint-statistical-meetings-in-miami.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/5584803758844036452'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/5584803758844036452'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/07/joint-statistical-meetings-in-miami.html' title='Joint Statistical Meetings in Miami'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-8562921117849576981</id><published>2011-07-20T13:10:00.007-04:00</published><updated>2011-07-20T13:26:27.551-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Philosophy'/><category scheme='http://www.blogger.com/atom/ns#' term='Libertarianism'/><category scheme='http://www.blogger.com/atom/ns#' term='Economics'/><category scheme='http://www.blogger.com/atom/ns#' term='Politics'/><title type='text'>Non-Sports Link: Libertarians and Progressives Can be Friends</title><content type='html'>&lt;a href="http://dailycaller.com/2011/07/08/seven-reasons-progressives-should-be-more-libertarian/"&gt;A great article from the author of the Bleeding Heart Libertarians blog.&lt;/a&gt;  I'll admit that there isn't a better word to describe my political views than "libertarian".  I'm certainly not Milton Freidman or &lt;a href="http://jeffreymiron.com/"&gt;Jeffrey Miron&lt;/a&gt;--both of which I admire and respect greatly--but I can't consider myself "Bleeding Heart" either.  Maybe many of my issues are with extreme--and unfortunately often uninformed--left swinging folks that I had to deal with at a very liberal undergraduate institution.  But I often despair that when people think libertarian (and often times generalized to "Economists") it is unfortunate that they often think of someone with no values and little empathy.  Matt Zwolinski does a great job of communicating this, and I especially like the following quote:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(0, 0, 153);"&gt;"&lt;/span&gt;&lt;span style="font-style: italic; color: rgb(0, 0, 153);"&gt;These are my reasons for thinking that progressives should have greater  confidence in free markets and civil society to realize their values,  and less confidence in&lt;/span&gt;&lt;a id="KonaLink0" class="kLink" style="text-decoration: underline ! important; position: static; font-family: inherit ! important; font-weight: inherit ! important; font-size: inherit ! important; font-style: italic; color: rgb(0, 0, 153);" href="http://dailycaller.com/2011/07/08/seven-reasons-progressives-should-be-more-libertarian/4/#"&gt;&lt;span style=" font-weight:inherit !important;position:static;font-family:inherit !important;font-size:inherit !important;color:green !important;"   &gt;&lt;span class="kLink" style=" font-weight: inherit ! important;  position: static; border-bottom: 1px solid green; background-font-family:inherit ! important;font-size:inherit ! important;color:transparent;"   &gt;&lt;/span&gt;&lt;/span&gt;&lt;/a&gt;&lt;span style="font-style: italic; color: rgb(0, 0, 153);"&gt; government regulation. But even if progressives are not convinced by that claim, I  hope they are convinced by another one: namely, that political  disagreement does not always, or even usually, imply an irreconcilable  conflict of fundamental values. Progressives and libertarians should  realize that they share many more values in common than they probably  think, and that their different political prescriptions are less the  product of an epic battle of good vs. evil and more a function of  reasonable disagreement regarding how to prioritize and realize their  common goals. Even if disagreement persists, bearing this point in mind  should make that disagreement a more civil and productive one.&lt;/span&gt;&lt;span style="color: rgb(0, 0, 153);"&gt;"&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;Libertarianism and moral values are not mutually exclusive.  The economic prescriptions of a strictly libertarian viewpoint are an invaluable starting point to base policy.  Once we have that cost-benefit and understanding of efficiency of a free market, one must turn to the values of the society and the best balance of both in order to foster both economic and societal growth.  As Zwolinski says, "&lt;span style="font-style: italic; color: rgb(0, 0, 153);"&gt;Good intentions, even when they exist, are not enough&lt;/span&gt;." &lt;div style="overflow: hidden; color: rgb(0, 0, 0); background-color: transparent; text-align: left; text-decoration: none; border: medium none;"&gt;&lt;br /&gt;Link: http://dailycaller.com/2011/07/08/seven-reasons-progressives-should-be-more-libertarian/&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Hat Tip: ECON Jeff Blog (see sidebar)&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-8562921117849576981?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/8562921117849576981/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/07/non-sports-link-libertarians-and.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/8562921117849576981'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/8562921117849576981'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/07/non-sports-link-libertarians-and.html' title='Non-Sports Link: Libertarians and Progressives Can be Friends'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-4861209639267035321</id><published>2011-07-14T11:41:00.020-04:00</published><updated>2011-07-14T14:49:41.955-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='R-project'/><category scheme='http://www.blogger.com/atom/ns#' term='Statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='Baseball'/><category scheme='http://www.blogger.com/atom/ns#' term='Pitch F/X'/><title type='text'>Sam Fuld, Bob Carpenter, and Statistical Inference Blog</title><content type='html'>Here is a quick post responding to &lt;a href="http://www.stat.columbia.edu/%7Ecook/movabletype/archives/2011/07/super_sam_fuld.html"&gt;a request by Bob Carpenter at one of my favorite nerd blogs: Statistical Modeling, Causal Inference and Social Science&lt;/a&gt;.  While a lot of the Bayesian theory is out of my league, Dr. Gelman really makes you think about some applied statistical problems in social science.&lt;br /&gt;&lt;br /&gt;Anyway, the request was for a quick scatter plot (I'm not going to go nuts and pull out Bugs code for some Bayesian Hierarchical Model or anything like that here!) of batter performance and ability to foul balls off in given counts (I could also do base-out states, but I'll keep it simple for now).&lt;br /&gt;&lt;br /&gt;Luckily, I had R up and running with my Pitch F/X database already in.  Of course, a full analysis would require understanding where the pitches are thrown that are being fouled off (along with velocity and pitch type), but then it gets a bit complicated.  Anyway, here we go.  I'll start with a quick table of averages for percentage of pitches fouled off in each count (please excuse the awful table formatting here).&lt;br /&gt;&lt;br /&gt;&lt;table border="0" cellpadding="0" cellspacing="0" width="150"&gt;&lt;col style="width: 38pt;" width="50" span="3"&gt;  &lt;tbody&gt;&lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl66" style="height: 15pt; width: 38pt; text-align: left; font-weight: bold;" width="50" height="20"&gt;0-0&lt;/td&gt;   &lt;td style="vertical-align: top;"&gt;&lt;br /&gt;&lt;/td&gt;&lt;td style="vertical-align: top;"&gt;&lt;br /&gt;&lt;/td&gt;&lt;td class="xl66" style="border-left: medium none; width: 38pt; text-align: left; font-weight: bold;" width="50"&gt;0-1&lt;/td&gt;   &lt;td class="xl66" style="border-left: medium none; width: 38pt; text-align: left; font-weight: bold;" width="50"&gt;0-2&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt; border-top: medium none; text-align: left;" height="20"&gt;10.37%&lt;/td&gt;   &lt;td style="vertical-align: top;"&gt;&lt;br /&gt;&lt;/td&gt;&lt;td style="vertical-align: top;"&gt;&lt;br /&gt;&lt;/td&gt;&lt;td class="xl67" style="border-top: medium none; border-left: medium none; text-align: left;"&gt;17.61%&lt;/td&gt;   &lt;td class="xl67" style="border-top: medium none; border-left: medium none; text-align: left;"&gt;19.20%&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl66" style="height: 15pt; border-top: medium none; text-align: left; font-weight: bold;" height="20"&gt;1-0&lt;/td&gt;   &lt;td style="vertical-align: top;"&gt;&lt;br /&gt;&lt;/td&gt;&lt;td style="vertical-align: top;"&gt;&lt;br /&gt;&lt;/td&gt;&lt;td class="xl66" style="border-top: medium none; border-left: medium none; text-align: left; font-weight: bold;"&gt;1-1&lt;/td&gt;   &lt;td class="xl66" style="border-top: medium none; border-left: medium none; text-align: left; font-weight: bold;"&gt;1-2&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt; border-top: medium none; text-align: left;" height="20"&gt;15.55%&lt;/td&gt;   &lt;td style="vertical-align: top;"&gt;&lt;br /&gt;&lt;/td&gt;&lt;td style="vertical-align: top;"&gt;&lt;br /&gt;&lt;/td&gt;&lt;td class="xl67" style="border-top: medium none; border-left: medium none; text-align: left;"&gt;20.46%&lt;/td&gt;   &lt;td class="xl67" style="border-top: medium none; border-left: medium none; text-align: left;"&gt;22.44%&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl66" style="height: 15pt; border-top: medium none; text-align: left; font-weight: bold;" height="20"&gt;2-0&lt;/td&gt;   &lt;td style="vertical-align: top;"&gt;&lt;br /&gt;&lt;/td&gt;&lt;td style="vertical-align: top;"&gt;&lt;br /&gt;&lt;/td&gt;&lt;td class="xl66" style="border-top: medium none; border-left: medium none; text-align: left; font-weight: bold;"&gt;2-1&lt;/td&gt;   &lt;td class="xl66" style="border-top: medium none; border-left: medium none; text-align: left; font-weight: bold;"&gt;2-2&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt; border-top: medium none; text-align: left;" height="20"&gt;15.39%&lt;/td&gt;   &lt;td style="vertical-align: top;"&gt;&lt;br /&gt;&lt;/td&gt;&lt;td style="vertical-align: top;"&gt;&lt;br /&gt;&lt;/td&gt;&lt;td class="xl67" style="border-top: medium none; border-left: medium none; text-align: left;"&gt;23.30%&lt;/td&gt;   &lt;td class="xl67" style="border-top: medium none; border-left: medium none; text-align: left;"&gt;26.00%&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl66" style="height: 15pt; border-top: medium none; text-align: left; font-weight: bold;" height="20"&gt;3-0&lt;/td&gt;   &lt;td style="vertical-align: top;"&gt;&lt;br /&gt;&lt;/td&gt;&lt;td style="vertical-align: top;"&gt;&lt;br /&gt;&lt;/td&gt;&lt;td class="xl66" style="border-top: medium none; border-left: medium none; text-align: left; font-weight: bold;"&gt;3-1&lt;/td&gt;   &lt;td class="xl66" style="border-top: medium none; border-left: medium none; text-align: left; font-weight: bold;"&gt;3-2&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt; border-top: medium none; text-align: left;" height="20"&gt;2.41%&lt;/td&gt;   &lt;td style="vertical-align: top;"&gt;&lt;br /&gt;&lt;/td&gt;&lt;td style="vertical-align: top;"&gt;&lt;br /&gt;&lt;/td&gt;&lt;td class="xl67" style="border-top: medium none; border-left: medium none; text-align: left;"&gt;21.48%&lt;/td&gt;   &lt;td class="xl67" style="border-top: medium none; border-left: medium none; text-align: left;"&gt;29.91%&lt;/td&gt;  &lt;/tr&gt; &lt;/tbody&gt;&lt;/table&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;From this, we can glean that guys don't foul the ball much in 3-0 counts.  This could be because they see easier pitches to hit and/or they're taking the pitch very often.  Probably a combination of both.  Keep in mind that these numbers are also biased.  We don't see the same batters the same number of times in these different counts.  Now for foul percent plotted against wOBA:&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;img src="file:///C:/Users/Brian/AppData/Local/Temp/moz-screenshot.png" alt="" /&gt;&lt;a href="http://1.bp.blogspot.com/-VeqrYSwVpQA/Th8tUHfLrtI/AAAAAAAAAW4/Dq3A34tfTzk/s1600/playerwOBA.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 400px;" src="http://1.bp.blogspot.com/-VeqrYSwVpQA/Th8tUHfLrtI/AAAAAAAAAW4/Dq3A34tfTzk/s400/playerwOBA.png" alt="" id="BLOGGER_PHOTO_ID_5629267882866880210" border="0" /&gt;&lt;/a&gt;If anything, there's a slight downward trend here (as found before at Baseball Analysts, linked at the previous link).  And finally, foul percentage plotted against wOBA for each count.  Here, I removed outliers (well, outliers defined as 2 standard deviations above the average foul rate), as they should make up most of the players who did not get nearly enough at bats for the foul rates to matter.  This didn't work perfectly and there are some obvious anomolies likely due to low plate-appearances, but I think we get a decent look at things.  Also, the lower censoring (at 0) makes it more difficult to pick up a pattern in the plots.  In addition, the plot includes player-seasons, not just players.   So someone like Pujols will be in here 4 times (2007 through 2010):&lt;br /&gt;&lt;br /&gt;&lt;a href="http://1.bp.blogspot.com/-Ol5-WJh5zQ8/Th85GX9as3I/AAAAAAAAAXA/u9wd47Ucn2o/s1600/FoulsCountwOBA.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 400px;" src="http://1.bp.blogspot.com/-Ol5-WJh5zQ8/Th85GX9as3I/AAAAAAAAAXA/u9wd47Ucn2o/s400/FoulsCountwOBA.png" alt="" id="BLOGGER_PHOTO_ID_5629280840910025586" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;It might be instructive to look at these same plots only for pitches swung at (so players aren't penalized for being selective at the plate) and/or only on pitches near the edges of the strike zone (so we're just looking at pitches that the players are fighting off).  The analysis here doesn't show too much going on, but that doesn't mean there's nothing there.&lt;br /&gt;&lt;br /&gt;Below, I've done the latter, with the same plots from above.  I define the edge as 8 inches from the center of the plate and/or below 1.8 feet or above 3.3 feet vertically.  Of course, you can define the edge in a number of ways.  This is rough, quick code and I didn't have time to get into too much detail today:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://4.bp.blogspot.com/-KCuKK2El9wk/Th85GoboqAI/AAAAAAAAAXI/t3Q-jCqMjNc/s1600/FoulsCountwOBAEdge.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 400px;" src="http://4.bp.blogspot.com/-KCuKK2El9wk/Th85GoboqAI/AAAAAAAAAXI/t3Q-jCqMjNc/s400/FoulsCountwOBAEdge.png" alt="" id="BLOGGER_PHOTO_ID_5629280845331736578" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Keep in mind this is only for Pitch F/X data.  That means some of 2007, and all of the 2008 through 2010 regular seasons.  I try to wait until the end of the season to update my database each year.  I imagine this would be more interesting with even more years of data (like from Retrosheet, as mentioned in the linked blog post).  I think Dan Turkenkopf is going to try this out, as he says in the comments.  Perhaps I'll extend this later on to the swinging only as well.&lt;br /&gt;&lt;br /&gt;Finally, one other thing to look at is whether pitchers really do get frustrated after a long string of foul balls and get burned throwing a pitch down the middle.  There is probably a skill somewhere between fouling pitches off and flat out missing those pitches just because a better batter likely make contact more often.  But in terms of purposefully trying to foul a pitch off--at least from my own experience playing baseball--I have doubts that guys go up there looking to 'spoil' pitches.  To foul a pitch off, you have to make sure it doesn't hit the bat directly, otherwise it would go into play.  Hard to believe that in and of itself would be a repeatable skill.  To just edge the bat to the ball, you've got a good chance of missing it, too.&lt;br /&gt;&lt;br /&gt;This is by no means a deep analysis, and I didn't do any sort of fantastic job at cleaning it up beforehand.  Just some fun crosstabs and scatter plots.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Any thoughts from those of you reading this????&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-4861209639267035321?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/4861209639267035321/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/07/sam-fuld-bob-carpenter-and-statistical.html#comment-form' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/4861209639267035321'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/4861209639267035321'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/07/sam-fuld-bob-carpenter-and-statistical.html' title='Sam Fuld, Bob Carpenter, and Statistical Inference Blog'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-VeqrYSwVpQA/Th8tUHfLrtI/AAAAAAAAAW4/Dq3A34tfTzk/s72-c/playerwOBA.png' height='72' width='72'/><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-9177733265747260652</id><published>2011-07-05T16:22:00.003-04:00</published><updated>2011-07-05T16:39:59.010-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='NFL'/><category scheme='http://www.blogger.com/atom/ns#' term='Presentations'/><category scheme='http://www.blogger.com/atom/ns#' term='NBA'/><category scheme='http://www.blogger.com/atom/ns#' term='Economics'/><category scheme='http://www.blogger.com/atom/ns#' term='NHL'/><category scheme='http://www.blogger.com/atom/ns#' term='Baseball'/><category scheme='http://www.blogger.com/atom/ns#' term='Conference'/><category scheme='http://www.blogger.com/atom/ns#' term='Pitch F/X'/><title type='text'>Forgot to Announce This</title><content type='html'>Though I'm late on this, I've been in the habit of announcing presentations of things I have been working on recently.  At the WEAI conference, I am a co-author on two presentations (one of which I have put together the majority of the analysis).  Unfortunately, I was unable to get funding for WEAI because I am attending a bunch of other conferences this summer, including the Joint Statistical Meetings in Miami at the beginning of August.  Anyway, here are some recent presentations (they were given by Dr. Rodney Fort and Dr. Jason Winfree, respectively).  You can get the full Western Economic Association International conference program &lt;a href="http://www.weai.org/Content/Files/Prelim_Schedule_5-27-11.pdf"&gt;right here.&lt;br /&gt;&lt;/a&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Attendance Time Series and Outcome Uncertainty in the NBA, NFL, and NHL&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;Brian Mills and Rodney Fort&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Discrimination Among MLB Umpires&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;Scott Tainsky, Brian Mills and Jason Winfree&lt;br /&gt;&lt;br /&gt;The first paper simply looks at the long-run stationarity of attendance in the three leagues and assesses--at a very simple level--the influence of competitive balance (playoff, game and consecutive season uncertainty) on these attendance levels.  This is part of my dissertation, and there are a number of issues to be dealt with (not the least being the censoring issue for NFL sellouts).  I think this paper might bore most of the readers here--unless you're really into Lagrange Multiplier statistics for a unit root with breakpoints.&lt;br /&gt;&lt;br /&gt;I imagine that the latter paper would be of more interest to those here.  I can't divulge the entire paper (or much of it really), but we tend to find that there is very little going on in the strike-calling data with respect to umpire race.  The data go back through 1996 (I think), and I update the study with some Pitch F/X analysis.  There's much to do, though.&lt;br /&gt;&lt;br /&gt;In addition to these recent presentations, my fellow graduate student Steve Salaga and I will be presenting on &lt;span style="font-style: italic;font-size:130%;" &gt;&lt;a href="http://www.amstat.org/meetings/jsm/2011/onlineprogram/AbstractDetails.cfm?abstractid=302190"&gt;&lt;span style="font-weight: bold;"&gt;Language-Based Discrimination in NHL Hall of Fame Voting&lt;/span&gt;&lt;/a&gt;&lt;/span&gt; at the &lt;a href="http://www.amstat.org/meetings/jsm/2011/index.cfm?fuseaction=confinfo"&gt;Joint Statistical Meetings&lt;/a&gt;.  There is a &lt;a href="http://www.amstat.org/meetings/jsm/2011/onlineprogram/MainSearchResults.cfm"&gt;whole section on sports statistics&lt;/a&gt; there, with a presentation  by &lt;a href="http://www-stat.wharton.upenn.edu/%7Estjensen/"&gt;Shane Jensen&lt;/a&gt; on fielding metrics.  It sounds like a lot of nerdy  fun.  For this paper, we implement a technique called Random Forests (spoiler alert, we don't find any evidence in the analysis of discriminatory behavior).  This is a parallel analysis to our forthcoming paper on MLB Hall Voting Discrimination in the Journal of Quantitative Analysis in Sports.  When I know the issue, I will link it here.  If anyone is dying to read it, let me know.&lt;br /&gt;&lt;br /&gt;Lastly, I would encourage anyone interested in sports statistics to attend the &lt;a href="http://www.amstat.org/chapters/boston/nessis11/abstracts.html"&gt;New England Symposium on Statistics in Sports&lt;/a&gt;.  For those interested in soccer (futball, football), there is a soccer analytics competition being run by StatDNA.  The winner gets a trip to the conference to present their paper and a $500 prize.  I am currently working on some things with some people you may know, but I won't be mentioning anything until later on.  It's been fun.&lt;br /&gt;&lt;br /&gt;Okay, off to get some work done.  Sorry that I have been somewhat MIA of late.  Been really bogged down with a lot of different projects.  Hope to get back to sab-R-metrics soon.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-9177733265747260652?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/9177733265747260652/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/07/forgot-to-announce-this.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/9177733265747260652'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/9177733265747260652'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/07/forgot-to-announce-this.html' title='Forgot to Announce This'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-8775574660613869837</id><published>2011-06-22T10:29:00.010-04:00</published><updated>2011-06-22T11:10:04.848-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Programming'/><category scheme='http://www.blogger.com/atom/ns#' term='R-project'/><category scheme='http://www.blogger.com/atom/ns#' term='Statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='Baseball'/><category scheme='http://www.blogger.com/atom/ns#' term='sab-R-metrics'/><category scheme='http://www.blogger.com/atom/ns#' term='Pitch F/X'/><title type='text'>sab-R-metrics: Merging Data Sets</title><content type='html'>I am finally back from Greece and recovered from jet lag.  Fortunately, I did not get tear gassed while in Athens, though there were riot police everywhere the whole time we visited.  Today, I'm going to start getting my feet wet again with a shorter sab-R-metrics post to assure everyone I'm not too MIA.&lt;br /&gt;&lt;br /&gt;Often times we have lots of data in different files that we want to link together.  If you have the information in an SQL database, there are ways to match things up using R.  However, I am no database management wizard and prefer to be able to look at my data in a full table format.  Unfortunately, this causes problems when I want to make sure to have player names linked to the player ids in my Pitch F/X data.  The issue is that the F/X data may have multiple instances or rows with the same player, while the player information file only has player ids and player names once (one per row).  Doing this manually can take forever (sometimes almost literally), and we need a quick way to import player names to the correct rows.  Pitch F/X tools like Joe Lefkowitz's already do this for you; however, if you have your own F/X database--or any other data with player ids that you would like to merge some data into--this tutorial should come in handy.&lt;br /&gt;&lt;br /&gt;Luckily, R has a nice function, '&lt;span style="color: rgb(153, 0, 0);"&gt;merge()&lt;/span&gt;', which allows for easy merging of files.  While I used to use SPSS to do this, once I found the R version I'll never go back.  The SPSS version is pretty handy, but extremely slow for large files and the software is outrageously expensive.&lt;br /&gt;&lt;br /&gt;First, I want you to &lt;a href="http://sitemaker.umich.edu/millsbrian/files/pitchesmerging.csv"&gt;download a file of 5,000 pitches here&lt;/a&gt;.  Once you have it in the correct place, load it into R and take a look at it.&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;#set working directory&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;setwd("c:/Users/Millsy/Documents/My Dropbox/Blog Stuff/sab-R-metrics")&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##load pitch file&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;pitches &amp;lt;- read.csv(file="PitchesMerging.csv", h=T)&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;head(pitches)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;As you can see, there are no player names in this file.  While you could go through and add them in manually--say in Excel or something like that--this would take way too long.  To get an idea of the number of names to be imported just for this small pitch file, use the following code:&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##give an idea of hte amount of work that manually merging would take&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;length(pitches[,1])&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;length(unique(pitches$batter_id))&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;length(unique(pitches$pitcher_id))&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The first line of code above tells us the number of rows in the data set--or the length of the first column in the data.  This comes in handy to make sure R loaded the number of rows you expected to see.  The second line of code again uses the '&lt;span style="color: rgb(153, 0, 0);"&gt;length()&lt;/span&gt;' function, but adds a new function we have not seen yet: '&lt;span style="color: rgb(153, 0, 0);"&gt;unique()&lt;/span&gt;'.  What this does is tells us how many different/unique batter ids there are in the data set.  The third line of code does the same for pitcher ids.  You can also use the '&lt;span style="color: rgb(153, 0, 0);"&gt;unique()&lt;/span&gt;' function on its own, and R will print each of the player ids within the data file (you could also assign this list or vector as an object using the assignment operator '&lt;span style="color: rgb(153, 0, 0);"&gt;&amp;lt;-&lt;/span&gt;').  Unique will come in handy when we get into more advanced "for loops" later on.&lt;br /&gt;&lt;br /&gt;As you can see, there are 286 unique batter ids and 113 unique pitcher ids.  In addition, there are many repeats, as there are 5,000 observations in the data file.  Doing this manually would take forever.  Luckily, I have a file with the player ids, the player names, player height and weight, player birth dates, and the first year played in pro ball, MLB, and the last year played in MLB.  We'll use R to easily merge this into our pitch file so that we can have player names and account for height and age of the player in our analyses using the pitch data.&lt;br /&gt;&lt;br /&gt;First, go ahead and &lt;a href="http://sitemaker.umich.edu/millsbrian/files/detailedplayers.csv"&gt;download the file with player names and some other information here&lt;/a&gt;.  Stick that into the same directory as the previous file and load it into R.  As always, take a look at the file to make sure it loaded correctly:&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##load player information file&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;players &amp;lt;- read.csv(file="detailedplayers.csv", h=T)&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;head(players)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Before doing any merging, we'll have to adjust some things with this file.  For the '&lt;span style="color: rgb(153, 0, 0);"&gt;merge()&lt;/span&gt;' function to work, you have to choose a variable that is contained in BOTH data sets to merge on.  For our purposes, we'll use the id of the player.  Unfortunately, the name of the variable is different in each file.  This is an easy fix.  While we're at it, it is probably a good idea to discriminate between the batter and pitcher names and information in the file, since both will be displayed in each row.  So first thing is first...let's rename the variables.  For this, we'll use another new function, '&lt;span style="color: rgb(153, 0, 0);"&gt;colnames()&lt;/span&gt;'.  The following code should rename everything the way we want, and we'll start by merging the new data for &lt;span style="font-weight: bold;"&gt;batters&lt;/span&gt;.  Be sure not to omit the names of any columns or you will get an error:&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##rename columns for batters&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;colnames(players) &amp;lt;- c("batter_id", "b_first", "b_last", "b_height", "b_weight", "b_birth_year", "b_pro_played_first", "b_mlb_played_first", "b_mlb_played_last")&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;head(players)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Always check to be sure things went correctly.  There is actually an option to do this automatically in the '&lt;span style="color: rgb(153, 0, 0);"&gt;merge()&lt;/span&gt;' function as the command "&lt;span style="color: rgb(153, 0, 0);"&gt;suffix=&lt;/span&gt;".  On data sets with a large number of columns, this can save you time.  But I found this to be a good time to introduce the "&lt;span style="color: rgb(153, 0, 0);"&gt;colnames()&lt;/span&gt;" function.&lt;br /&gt;&lt;br /&gt;Now we have two files with a similar variable to match on.  It's time to use the '&lt;span style="color: rgb(153, 0, 0);"&gt;merge()&lt;/span&gt;' function.  The merge function asks first for a '&lt;span style="color: rgb(153, 0, 0);"&gt;x&lt;/span&gt;' data set (the first one), and then a '&lt;span style="color: rgb(153, 0, 0);"&gt;y&lt;/span&gt;' data set (the second one).  It is important to remember what order you place them in the function, as you will also need to tell R that you want to keep all of the original pitches in this new merged data.  To save space in R--once I know things are working right--I simply reassign the merged data set as the original name '&lt;span style="color: rgb(153, 0, 0);"&gt;pitches&lt;/span&gt;'.&lt;br /&gt;&lt;br /&gt;To ensure that R makes a data set using all the pitches in the file, we want to use the option "&lt;span style="color: rgb(153, 0, 0);"&gt;all.x=T&lt;/span&gt;" or "&lt;span style="color: rgb(153, 0, 0);"&gt;all.y=T&lt;/span&gt;".  This will tell R that the players data are just a table being used for the pitch data, while we keep all the pitch data in tact in the new merged table.  Finally, we need to tell R which variable to match on using &lt;span style="color: rgb(153, 0, 0);"&gt;by="batter_id"&lt;/span&gt;.  Be sure to put the variable name in quotes.  The following code should do this for us:&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##do merge for batters&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;pitches &amp;lt;- merge(pitches, players, by="batter_id", all.x=T)&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;head(pitches)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Notice that it puts the "&lt;span style="color: rgb(153, 0, 0);"&gt;batter_id&lt;/span&gt;" variable in the first row of this new data set.  That's okay, and you can always restructure your data if this bothers you.  Now let's do the same for the pitchers in the pitch data.  Don't forget to rename the variables in your player information table so that they don't overwrite the batter information, and also so that it matches on pitcher id, rather than batter id:&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##rename columns for pitchers&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;colnames(players) &amp;lt;- c("pitcher_id", "p_first", "p_last", "p_height", "p_weight", "p_birth_year", "p_pro_played_first",&lt;/span&gt;&lt;span style="color: rgb(51, 102, 255);"&gt; "p_mlb_played_first", "p_mlb_played_last")&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;head(players)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##do merge for pitchers&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;pitches &amp;lt;- merge(pitches, players, by="pitcher_id", all.x=T)&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;head(pitches)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Now, looking at the data, my first row has the 69 inch, 180 pound Dustin Pedroia against  a lanky 72 inch, 160 pound Miguel Bautista.  For this pitch, Pedroia gets a hit.  You can even double check that the players are correct by looking at the "&lt;span style="color: rgb(153, 0, 0);"&gt;ab_des&lt;/span&gt;" column, which gives a full description of what happened in the at bat.  Sure enough, it says, "Dustin Pedroia singles on a line drive to left fielder Ryan Langerhans.    J.   Drew to 2nd.".  Things seemed to have gone well here.  Now, you can save the new file so you don't have to worry about merging again with the following code:&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##write new table&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;write.csv(file="mergedpitches.csv", row.names=F)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Hopefully this will help out some of those looking to merge data together.  There is much of this needed with the different data sets (pitch f/x, Retrosheet, Baseball Reference, etc.) around the web.  You'll need a full mapping of all player ids.  I got mine from the Universal ID Project, &lt;a href="http://www.insidethebook.com/ee/index.php/site/comments/the_universal_player_id_and_biographical_data_project/"&gt;here is a link at The Book Blog&lt;/a&gt; for last year's version (I can't find the most recent link). &lt;br /&gt;&lt;br /&gt;In the end, R's functionality here is better than any other program that I have come across.  You always need to double check the data to make sure there aren't any bugs.  This is especially true with even larger data.  Ultimately, this can make life in R and baseball analytics about a million times easier--just be careful.  There are a few things I didn't go over here (like having it automatically sort when merging), so you can always check out how to use the function yourself with the R command "&lt;span style="color: rgb(153, 0, 0);"&gt;help(merge)&lt;/span&gt;".  Hope this helps!&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Pretty R Code:&lt;/span&gt;&lt;br /&gt;&lt;div style="overflow:auto;"&gt;&lt;div class="geshifilter"&gt;&lt;pre class="r geshifilter-R" style="font-family:monospace;"&gt;&lt;span style="color: #666666; font-style: italic;"&gt;#############################&lt;/span&gt; &lt;span style="color: #666666; font-style: italic;"&gt;################Sidetrack for Merging of Data Tables&lt;/span&gt; &lt;span style="color: #666666; font-style: italic;"&gt;#############################&lt;/span&gt;   &lt;span style="color: #666666; font-style: italic;"&gt;#set working directory&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/setwd"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;setwd&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;span style="color: #0000ff;"&gt;"c:/Users/Millsy/Documents/My Dropbox/Blog Stuff/sab-R-metrics"&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;   &lt;span style="color: #666666; font-style: italic;"&gt;##load pitch file&lt;/span&gt; pitches &lt;span style=""&gt;&amp;lt;-&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/utils/read.csv"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;read.csv&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;a href="http://inside-r.org/r-doc/base/file"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;file&lt;/span&gt;&lt;/a&gt;=&lt;span style="color: #0000ff;"&gt;"PitchesMerging.csv"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; h=T&lt;span style="color: #009900;"&gt;)&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/utils/head"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;head&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;pitches&lt;span style="color: #009900;"&gt;)&lt;/span&gt;   &lt;span style="color: #666666; font-style: italic;"&gt;##give an idea of hte amount of work that manually merging would take&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/length"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;length&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;pitches&lt;span style="color: #009900;"&gt;[&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt;&lt;span style="color: #cc66cc;"&gt;1&lt;/span&gt;&lt;span style="color: #009900;"&gt;]&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/length"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;length&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;a href="http://inside-r.org/r-doc/base/unique"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;unique&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;pitches&lt;span style=""&gt;$&lt;/span&gt;batter_id&lt;span style="color: #009900;"&gt;)&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/length"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;length&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;a href="http://inside-r.org/r-doc/base/unique"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;unique&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;pitches&lt;span style=""&gt;$&lt;/span&gt;pitcher_id&lt;span style="color: #009900;"&gt;)&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;   &lt;span style="color: #666666; font-style: italic;"&gt;##load player information file&lt;/span&gt; players &lt;span style=""&gt;&amp;lt;-&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/utils/read.csv"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;read.csv&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;a href="http://inside-r.org/r-doc/base/file"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;file&lt;/span&gt;&lt;/a&gt;=&lt;span style="color: #0000ff;"&gt;"detailedplayers.csv"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; h=T&lt;span style="color: #009900;"&gt;)&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/utils/head"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;head&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;players&lt;span style="color: #009900;"&gt;)&lt;/span&gt;   &lt;span style="color: #666666; font-style: italic;"&gt;##rename columns for batters&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/colnames"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;colnames&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;players&lt;span style="color: #009900;"&gt;)&lt;/span&gt; &lt;span style=""&gt;&amp;lt;-&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/c"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;c&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;span style="color: #0000ff;"&gt;"batter_id"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; &lt;span style="color: #0000ff;"&gt;"b_first"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; &lt;span style="color: #0000ff;"&gt;"b_last"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; &lt;span style="color: #0000ff;"&gt;"b_height"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; &lt;span style="color: #0000ff;"&gt;"b_weight"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; &lt;span style="color: #0000ff;"&gt;"b_birth_year"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; &lt;span style="color: #0000ff;"&gt;"b_pro_played_first"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt;  &lt;span style="color: #0000ff;"&gt;"b_mlb_played_first"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; &lt;span style="color: #0000ff;"&gt;"b_mlb_played_last"&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/utils/head"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;head&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;players&lt;span style="color: #009900;"&gt;)&lt;/span&gt;   &lt;span style="color: #666666; font-style: italic;"&gt;##do merge for batters&lt;/span&gt; pitches &lt;span style=""&gt;&amp;lt;-&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/merge"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;merge&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;pitches&lt;span style="color: #339933;"&gt;,&lt;/span&gt; players&lt;span style="color: #339933;"&gt;,&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/by"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;by&lt;/span&gt;&lt;/a&gt;=&lt;span style="color: #0000ff;"&gt;"batter_id"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; all.x=T&lt;span style="color: #009900;"&gt;)&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/utils/head"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;head&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;pitches&lt;span style="color: #009900;"&gt;)&lt;/span&gt;   &lt;span style="color: #666666; font-style: italic;"&gt;##rename columns for pitchers&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/colnames"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;colnames&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;players&lt;span style="color: #009900;"&gt;)&lt;/span&gt; &lt;span style=""&gt;&amp;lt;-&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/c"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;c&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;span style="color: #0000ff;"&gt;"pitcher_id"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; &lt;span style="color: #0000ff;"&gt;"p_first"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; &lt;span style="color: #0000ff;"&gt;"p_last"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; &lt;span style="color: #0000ff;"&gt;"p_height"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; &lt;span style="color: #0000ff;"&gt;"p_weight"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; &lt;span style="color: #0000ff;"&gt;"p_birth_year"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; &lt;span style="color: #0000ff;"&gt;"p_pro_played_first"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt;  &lt;span style="color: #0000ff;"&gt;"p_mlb_played_first"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; &lt;span style="color: #0000ff;"&gt;"p_mlb_played_last"&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/utils/head"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;head&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;players&lt;span style="color: #009900;"&gt;)&lt;/span&gt;   &lt;span style="color: #666666; font-style: italic;"&gt;##do merge for pitchers&lt;/span&gt; pitches &lt;span style=""&gt;&amp;lt;-&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/merge"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;merge&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;pitches&lt;span style="color: #339933;"&gt;,&lt;/span&gt; players&lt;span style="color: #339933;"&gt;,&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/by"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;by&lt;/span&gt;&lt;/a&gt;=&lt;span style="color: #0000ff;"&gt;"pitcher_id"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; all.x=T&lt;span style="color: #009900;"&gt;)&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/utils/head"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;head&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;pitches&lt;span style="color: #009900;"&gt;)&lt;/span&gt;     &lt;span style="color: #666666; font-style: italic;"&gt;##write new table&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/utils/write.csv"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;write.csv&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;a href="http://inside-r.org/r-doc/base/file"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;file&lt;/span&gt;&lt;/a&gt;=&lt;span style="color: #0000ff;"&gt;"mergedpitches.csv"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/row.names"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;row.names&lt;/span&gt;&lt;/a&gt;=F&lt;span style="color: #009900;"&gt;)&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-8775574660613869837?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/8775574660613869837/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/06/sab-r-metrics-merging-data-sets.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/8775574660613869837'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/8775574660613869837'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/06/sab-r-metrics-merging-data-sets.html' title='sab-R-metrics: Merging Data Sets'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-2060819899561538949</id><published>2011-06-07T11:57:00.004-04:00</published><updated>2011-06-07T12:02:50.657-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Presentations'/><category scheme='http://www.blogger.com/atom/ns#' term='Conference'/><title type='text'>Off to Greece</title><content type='html'>After being back in the U.S. for two days, I'm headed off to Athens tomorrow for another conference.  It is rather small, but I couldn't pass up the chance for some practice at presenting publicly and, well, going to Greece!  Though, I was relatively impressed with the Richmond Street nightlife in London, Ontario.&lt;br /&gt;&lt;br /&gt;If you're in the area (not likely, but I know there are some international readers here), stop by.  It will be at the St. George Lycabettus Hotel in Athens (they really know how to do it in Greece!).  The conference is put on by ATINER and the general topic is Tourism.  I am again presenting with Dr. Mark Rosentraub.  Below is the title of the presentation:&lt;br /&gt;&lt;br /&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:worddocument&gt;   &lt;w:view&gt;Normal&lt;/w:View&gt;   &lt;w:zoom&gt;0&lt;/w:Zoom&gt;   &lt;w:trackmoves/&gt;   &lt;w:trackformatting/&gt;   &lt;w:punctuationkerning/&gt;   &lt;w:validateagainstschemas/&gt;   &lt;w:saveifxmlinvalid&gt;false&lt;/w:SaveIfXMLInvalid&gt;   &lt;w:ignoremixedcontent&gt;false&lt;/w:IgnoreMixedContent&gt;   &lt;w:alwaysshowplaceholdertext&gt;false&lt;/w:AlwaysShowPlaceholderText&gt;   &lt;w:donotpromoteqf/&gt;   &lt;w:lidthemeother&gt;EN-US&lt;/w:LidThemeOther&gt;   &lt;w:lidthemeasian&gt;X-NONE&lt;/w:LidThemeAsian&gt;   &lt;w:lidthemecomplexscript&gt;X-NONE&lt;/w:LidThemeComplexScript&gt;   &lt;w:compatibility&gt;    &lt;w:breakwrappedtables/&gt;    &lt;w:snaptogridincell/&gt;    &lt;w:wraptextwithpunct/&gt;    &lt;w:useasianbreakrules/&gt;    &lt;w:dontgrowautofit/&gt;    &lt;w:splitpgbreakandparamark/&gt;    &lt;w:dontvertaligncellwithsp/&gt;    &lt;w:dontbreakconstrainedforcedtables/&gt;    &lt;w:dontvertalignintxbx/&gt;    &lt;w:word11kerningpairs/&gt;    &lt;w:cachedcolbalance/&gt;   &lt;/w:Compatibility&gt;   &lt;w:browserlevel&gt;MicrosoftInternetExplorer4&lt;/w:BrowserLevel&gt;   &lt;m:mathpr&gt;    &lt;m:mathfont val="Cambria Math"&gt;    &lt;m:brkbin val="before"&gt;    &lt;m:brkbinsub val="&amp;#45;-"&gt;    &lt;m:smallfrac val="off"&gt;    &lt;m:dispdef/&gt;    &lt;m:lmargin val="0"&gt;    &lt;m:rmargin val="0"&gt;    &lt;m:defjc val="centerGroup"&gt;    &lt;m:wrapindent val="1440"&gt;    &lt;m:intlim val="subSup"&gt;    &lt;m:narylim val="undOvr"&gt;   &lt;/m:mathPr&gt;&lt;/w:WordDocument&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:latentstyles deflockedstate="false" defunhidewhenused="true" defsemihidden="true" defqformat="false" defpriority="99" latentstylecount="267"&gt;   &lt;w:lsdexception locked="false" priority="0" semihidden="false" unhidewhenused="false" qformat="true" name="Normal"&gt;   &lt;w:lsdexception locked="false" priority="9" semihidden="false" unhidewhenused="false" qformat="true" name="heading 1"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 2"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 3"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 4"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 5"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 6"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 7"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 8"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 9"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 1"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 2"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 3"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 4"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 5"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 6"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 7"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 8"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 9"&gt;   &lt;w:lsdexception locked="false" priority="35" qformat="true" name="caption"&gt;   &lt;w:lsdexception locked="false" priority="10" semihidden="false" unhidewhenused="false" qformat="true" name="Title"&gt;   &lt;w:lsdexception locked="false" priority="1" name="Default Paragraph Font"&gt;   &lt;w:lsdexception locked="false" priority="11" semihidden="false" unhidewhenused="false" qformat="true" name="Subtitle"&gt;   &lt;w:lsdexception locked="false" priority="22" semihidden="false" unhidewhenused="false" qformat="true" name="Strong"&gt;   &lt;w:lsdexception locked="false" priority="20" semihidden="false" unhidewhenused="false" qformat="true" name="Emphasis"&gt;   &lt;w:lsdexception locked="false" priority="59" semihidden="false" unhidewhenused="false" name="Table Grid"&gt;   &lt;w:lsdexception locked="false" unhidewhenused="false" name="Placeholder Text"&gt;   &lt;w:lsdexception locked="false" priority="1" semihidden="false" unhidewhenused="false" qformat="true" name="No Spacing"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1 Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2 Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1 Accent 1"&gt;   &lt;w:lsdexception locked="false" unhidewhenused="false" name="Revision"&gt;   &lt;w:lsdexception locked="false" priority="34" semihidden="false" unhidewhenused="false" qformat="true" name="List Paragraph"&gt;   &lt;w:lsdexception locked="false" priority="29" semihidden="false" unhidewhenused="false" qformat="true" name="Quote"&gt;   &lt;w:lsdexception locked="false" priority="30" semihidden="false" unhidewhenused="false" qformat="true" name="Intense Quote"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2 Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1 Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2 Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3 Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="19" semihidden="false" unhidewhenused="false" qformat="true" name="Subtle Emphasis"&gt;   &lt;w:lsdexception locked="false" priority="21" semihidden="false" unhidewhenused="false" qformat="true" name="Intense Emphasis"&gt;   &lt;w:lsdexception locked="false" priority="31" semihidden="false" unhidewhenused="false" qformat="true" name="Subtle Reference"&gt;   &lt;w:lsdexception locked="false" priority="32" semihidden="false" unhidewhenused="false" qformat="true" name="Intense Reference"&gt;   &lt;w:lsdexception locked="false" priority="33" semihidden="false" unhidewhenused="false" qformat="true" name="Book Title"&gt;   &lt;w:lsdexception locked="false" priority="37" name="Bibliography"&gt;   &lt;w:lsdexception locked="false" priority="39" qformat="true" name="TOC Heading"&gt;  &lt;/w:LatentStyles&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;!--[if gte mso 10]&gt; &lt;style&gt;  /* Style Definitions */  table.MsoNormalTable  {mso-style-name:"Table Normal";  mso-tstyle-rowband-size:0;  mso-tstyle-colband-size:0;  mso-style-noshow:yes;  mso-style-priority:99;  mso-style-qformat:yes;  mso-style-parent:"";  mso-padding-alt:0in 5.4pt 0in 5.4pt;  mso-para-margin:0in;  mso-para-margin-bottom:.0001pt;  mso-pagination:widow-orphan;  font-size:11.0pt;  font-family:"Calibri","sans-serif";  mso-ascii-font-family:Calibri;  mso-ascii-theme-font:minor-latin;  mso-fareast-font-family:"Times New Roman";  mso-fareast-theme-font:minor-fareast;  mso-hansi-font-family:Calibri;  mso-hansi-theme-font:minor-latin;  mso-bidi-font-family:"Times New Roman";  mso-bidi-theme-font:minor-bidi;} &lt;/style&gt; &lt;![endif]--&gt;&lt;b style="color: rgb(0, 0, 0);"&gt;&lt;span style="font-size: 16pt; font-family: &amp;quot;Times New Roman&amp;quot;,&amp;quot;serif&amp;quot;;"&gt;Measuring the Local Economic Benefits of Regional Assets: Opportunity Costs and the Best Use of Land for Regional Development&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;That also means there probably won't be any sab-R-metrics articles up until after I get back (I'll return on June 16th).  Hopefully I can get on a roll after that, as I only have one more conference to go to in the summer (Joint Statistical Meetings in August in Miami Beach--the Sport Sections are highly recommended for you sports guys).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-2060819899561538949?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/2060819899561538949/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/06/off-to-greece.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/2060819899561538949'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/2060819899561538949'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/06/off-to-greece.html' title='Off to Greece'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-370488664274530293</id><published>2011-06-06T10:31:00.003-04:00</published><updated>2011-06-06T10:41:36.300-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Link'/><category scheme='http://www.blogger.com/atom/ns#' term='Competitions'/><category scheme='http://www.blogger.com/atom/ns#' term='Baseball'/><category scheme='http://www.blogger.com/atom/ns#' term='sab-R-metrics'/><category scheme='http://www.blogger.com/atom/ns#' term='Predictions'/><title type='text'>A Sabermetrics Prediction Competition at Kaggle?</title><content type='html'>I ran across &lt;a href="http://bigcomputing.blogspot.com/2011/06/r-bloggers-posts-sabermetric-article.html"&gt;this post today at Big Computing&lt;/a&gt; (now on the sidebar).  I've toyed around with the &lt;a href="http://www.kaggle.com/"&gt;Kaggle&lt;/a&gt; competitions in the past, but haven't really been able to come up with serious competition beyond the basic data mining tools available in R.  They work great, but there are some serious programmers that develop their own classifiers and prediction tools that outclass anything I can do (especially in my free time).&lt;br /&gt;&lt;br /&gt;Anyway, there is a mention about a sabermetric prediction competition.  I know there are plenty of people around here that would have a lot of fun with something like this.  If you haven't been to Kaggle before, I highly suggest checking it out.  They give out money for the top predictive techniques.  They provide the training data, and a hold out test sample for the leader board.  Most recently, there is a &lt;a href="http://www.heritagehealthprize.com/c/hhp"&gt;Heritage Health Prize&lt;/a&gt;, with the winner getting a multi-million dollar prize!&lt;br /&gt;&lt;br /&gt;They're asking for suggestions, and here is mine:&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(153, 0, 0); font-style: italic;"&gt;"I really think using FX data would be a good road for this.  For the most fun, it may be interesting to predict whether a single pitch is made contact with or not, given the game state, type of pitch, count, the opposing batter abilities, pitcher ability, velocity, location, etc."&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Any other thoughts?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-370488664274530293?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/370488664274530293/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/06/sabermetrics-prediction-competition-at.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/370488664274530293'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/370488664274530293'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/06/sabermetrics-prediction-competition-at.html' title='A Sabermetrics Prediction Competition at Kaggle?'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-7100657897706485762</id><published>2011-06-01T09:05:00.005-04:00</published><updated>2011-06-01T09:28:16.453-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Presentations'/><category scheme='http://www.blogger.com/atom/ns#' term='Academic'/><category scheme='http://www.blogger.com/atom/ns#' term='Sport Management'/><category scheme='http://www.blogger.com/atom/ns#' term='Conference'/><title type='text'>Off to NASSM in Canada</title><content type='html'>Today I am off to the North American Society for Sport Management Conference in London, Ontario.  While the group seems to have really missed the ball on the location (sorry guys and gals, but come on) I am excited to attend.  This will be my first time at this conference and I am looking forward to the experience.  If you are going to be there, I'd love to meet you.  If you're in town, feel free to come on by my presentations.  I have two,  for which I have provided titles below (click the link for the abstract).  The first presentation is Thursday at 8:30 am, while the second one is Friday at 8:45 am.  Looks like I get to be the early bird all week...&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.nassm.com/files/conf_abstracts/2011-005.pdf"&gt;&lt;span style="font-weight: bold;"&gt;Major League Baseball Franchise Attendance and the Uncertainty of Outcome Hypothesis&lt;/span&gt;&lt;/a&gt;&lt;br /&gt;&lt;span style="font-style: italic; font-weight: bold;"&gt;Brian Mills and Rodney Fort&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.nassm.com/files/conf_abstracts/2011-081.pdf"&gt;&lt;span style="font-weight: bold;"&gt;Public Investment in Sports Facilities: Who Really Pays and the Implications for Progressive&lt;/span&gt;&lt;span style="font-weight: bold;"&gt; Taxation&lt;/span&gt;&lt;/a&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Mark Rosentraub, Brian Mills, Michael Cantor and Jason Winfree&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I will be presenting the first one on my own, while fellow graduate student Michael Cantor and I will be presenting the second one.  Also look for these from our department (and one from a graduate of our department currently working at Illinois, Scott Tainsky):&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.nassm.com/files/conf_abstracts/2011-003.pdf"&gt;&lt;span style="font-weight: bold;"&gt;Effects of Personal Involvement and Expert Information on Fantasy Sports Consumers’ Winning&lt;/span&gt;&lt;span style="font-weight: bold;"&gt; Expectancy and Anticipated Emotion&lt;/span&gt;&lt;/a&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="font-weight: bold;"&gt;Dae Hee Kwak and Joon Sung Lee&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.nassm.com/files/conf_abstracts/2011-011.pdf"&gt;&lt;span style="font-weight: bold;"&gt;Demand for Individual Sports: Estimating Pay-Per-View Buyrates for the Ultimate Fighting&lt;/span&gt;&lt;span style="font-weight: bold;"&gt; Championship&lt;/span&gt;&lt;/a&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="font-weight: bold;"&gt;Scott Tainsky, Steve Salaga&lt;/span&gt; and Carla Santos&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.nassm.com/files/conf_abstracts/2011-050.pdf"&gt;&lt;span style="font-weight: bold;"&gt;Gratitude toward Sponsors: Conceptual Framework and Empirical Examination&lt;/span&gt;&lt;/a&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Yu Kyoum Kim, Robert Smith and &lt;span style="font-weight: bold;"&gt;Dae Hee Kwak&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.nassm.com/files/conf_abstracts/2011-036.pdf"&gt;&lt;span style="font-weight: bold;"&gt;NCAA Football and the Invariance Proposition&lt;/span&gt;&lt;/a&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Steve Salaga and Rodney Fort&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.nassm.com/files/conf_abstracts/2011-195.pdf"&gt;&lt;span style="font-weight: bold;"&gt;Athlete Philanthropy: Motives, Drivers and Intentions&lt;/span&gt;&lt;/a&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Seung Pil Lee, Kathryn Heinze, Kathy Babiak and Matt Juravich&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.nassm.com/files/conf_abstracts/2011-187.pdf"&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;/a&gt;&lt;a href="http://www.nassm.com/files/conf_abstracts/2011-187.pdf"&gt;&lt;span style="font-weight: bold;"&gt;Measuring the Contribution of Sport to Society: Social Capital, Collective Identities, Health&lt;/span&gt;&lt;span style="font-weight: bold;"&gt; Literacy, Well-being and Human Capital&lt;/span&gt;&lt;/a&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Seung Pil Lee, &lt;/span&gt;&lt;span style="font-style: italic;"&gt;T. Bettina Cornwell &lt;/span&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;and Kathy Babiak&lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;br /&gt;Quite a showing from our growing department!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-7100657897706485762?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/7100657897706485762/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/06/off-to-nassm-in-canada.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/7100657897706485762'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/7100657897706485762'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/06/off-to-nassm-in-canada.html' title='Off to NASSM in Canada'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-2021103621747414416</id><published>2011-05-31T11:30:00.006-04:00</published><updated>2011-05-31T16:46:12.728-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='R-project'/><category scheme='http://www.blogger.com/atom/ns#' term='Statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='Animation'/><category scheme='http://www.blogger.com/atom/ns#' term='Baseball'/><category scheme='http://www.blogger.com/atom/ns#' term='sab-R-metrics'/><category scheme='http://www.blogger.com/atom/ns#' term='Pitch F/X'/><title type='text'>sab-R-metrics: GIF Movies and Pitch Flights (Guest Post)</title><content type='html'>A couple weeks ago, I received an email from a fellow Pitch F/Xer and R-User, Josh Weinstock, asking if I was interested in a guest post here at Prince of Slides.  I didn't think I was important enough to have talented guests posting at my blog; however, Josh pointed out that this site tends to be the place for those who are part of a niche within a niche (i.e. Sabermetrics with R), and that it would be a great place to showcase some of his own work.&lt;br /&gt;&lt;br /&gt;Josh currently contributes to It's About the Money, a Yankee themed ESPN Sweetspot blog. He hails from North Carolina as a die-hard baseball fan. He welcomes discussion of baseball, punk music, and funny tv shows. You can reach him at josh82093 at gmail dot com or on twitter &lt;a href="http://twitter.com/j__stock"&gt;@J__Stock&lt;/a&gt; (two underscores). He has posted some pretty cool stuff like &lt;a href="http://itsaboutthemoney.net/archives/2011/03/21/fear-is-the-answer/"&gt;this Robinson Cano GIF image made in R&lt;/a&gt;.  Given his interesting posts and talent in R, I figured he would make a fantastic first ever Guest Post here at the site.  Here is what Josh has to say:&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="background-color: rgb(255, 204, 153);"&gt;Pitching  is complicated. In order to be successful, pitchers must have velocity,  movement, location, and deception. Thanks to pitch f/x data, the first  three are pretty easy to study. In fact, these three variables are more  or less directly recorded for every pitch thrown in major league  baseball. However, deception remains somewhat of a mystery. This is  mainly because we don't really understand how deception works. However,  one subset of deception is pretty easy to quantify: pitch flights. Pitch  flights allow us a glimpse into how the batter actually &lt;i&gt;sees&lt;/i&gt; the ball. This is just one more piece of data that we can use to help understand the mystery of pitching.&lt;br /&gt;&lt;br /&gt;The following tool is intended to help add pitch flight  visualizations to your analysis. Of particular importance is the  recognizability of breaking balls (size of the "hump") in relation to  the fastball. The time of .075 seconds after the ball is released is  also important, as this is the time that Robert Adair (author of the &lt;i&gt;physics of baseball&lt;/i&gt;) hypothesized that batters need to decide whether or not to swing. And the graphs are kind of cool.&lt;br /&gt;&lt;br /&gt;Before you start, you need to install the XML and animation  packages. You also need a basic knowledge of R, though if you read this  website I'm sure you're prepared. If you have trouble with the tool,  feel free to ask for help through email ( josh82093 at gmail dot com )  or twitter ( &lt;a target="_blank" href="http://twitter.com/j__stock"&gt;J__Stock&lt;/a&gt; ). &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;So from this, we have some R-code and a really cool (not just kind of cool) function that will plot pitch flights in an animated fashion.  While you'll want to have R experience before using this, the function is extremely user friendly.  All you need to know is how to set a working directory and the URL of the pitch data for your desired pitcher.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://sitemaker.umich.edu/millsbrian/files/ball_flight_program"&gt;You can download Josh's code by clicking here&lt;/a&gt;.  This is the guts of the function, and you'll need to use it in multiple steps.  The advantage of using this is that it will give a bit more flexibility for experienced R users.  As Josh said, you'll need to install the packages "XML" and "animation" and load them up using the "&lt;span style="color: rgb(153, 0, 0);"&gt;library()&lt;/span&gt;" function before you use Josh's code.  From there, open it up in R as a script and highlight the first two parts (&lt;span style="color: rgb(153, 0, 0);"&gt;"flightgrab()&lt;/span&gt;" and "&lt;span style="color: rgb(153, 0, 0);"&gt;plot.flight()&lt;/span&gt;").   Press "&lt;span style="color: rgb(153, 0, 0);"&gt;CTRL + R&lt;/span&gt;" for those first two functions.  Then, you can use these as standard functions as you would anything else in R.  With this code, you'll have to do this each time you open up R.  Remember to use:&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##load packages&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;library(XML)&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;library(animation)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Before you start using the functions and after you have installed them from the CRAN repository.&lt;br /&gt;&lt;br /&gt;(&lt;span style="font-style: italic;"&gt;I have also broken this full script down into smaller files so that you can use the "&lt;/span&gt;&lt;span style="color: rgb(153, 0, 0); font-style: italic;"&gt;source()&lt;/span&gt;&lt;span style="font-style: italic;"&gt;" function in R on the individual portions of the code.  This keeps from having to highlight the code every time you open up R.  I'll go over how to use "&lt;/span&gt;&lt;span style="color: rgb(153, 0, 0); font-style: italic;"&gt;source()&lt;/span&gt;&lt;span style="font-style: italic;"&gt;" in a later post, and I'll provide these as well.&lt;/span&gt; &lt;span style="font-style: italic;"&gt; If you're dying to have this, shoot me an email.&lt;/span&gt;)&lt;br /&gt;&lt;br /&gt;For the "&lt;span style="color: rgb(153, 0, 0);"&gt;flightgrab()&lt;/span&gt;" function, you'll need the Brooks Baseball URL for the game and pitcher you want.  It should be the page with the table format of the data.  You can find these by using the drop down menus at the site.  &lt;a href="http://www.brooksbaseball.net/pfxVB/tabdel_expanded.php?pitchSel=121250&amp;amp;game=gid_2011_05_25_tormlb_nyamlb_1/&amp;amp;s_type=&amp;amp;h_size=700&amp;amp;v_size=500"&gt;Here is an example of a the page you need to use&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;That's it.  Just type this within the parentheses (and be sure to put the full URL in quotes) and R will grab the data directly from the website, transform it, and turn it into a data frame for plotting the pitch flights.  Here is some example code below using Mariano Rivera's pitch data on May 25, 2011 against Toronto (remember to create the function in your R workspace first, so that R will know what "&lt;span style="color: rgb(153, 0, 0);"&gt;flightgrab()&lt;/span&gt;" is):&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##grab data&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;mariano &amp;lt;- flightgrab("http://www.brooksbaseball.net/pfxVB/tabdel_expanded.php?pitchSel=121250&amp;amp;game=gid_2011_05_25_tormlb_nyamlb_1/&amp;amp;s_type=&amp;amp;h_size=700&amp;amp;v_size=500")&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;To check to see if the data was downloaded and transformed correctly by the function, you can just type "&lt;span style="color: rgb(51, 102, 255);"&gt;mariano&lt;/span&gt;" or whatever name you gave it to look at what is under the hood.  From here, the "&lt;span style="color: rgb(153, 0, 0);"&gt;plot.flight()&lt;/span&gt;" function uses this data frame format to plot the flight of the ball.  We can do this in two ways.  If you take a look at the data set, it includes two pitch types: Four-Seamers and Cutters.  The data grabbing function automatically puts the information into a flight sequence with 18 data points.  So, both pitches have their own flight track.  When we use "&lt;span style="color: rgb(153, 0, 0);"&gt;plot.flight()&lt;/span&gt;" to plot these in a color--say, Dark Red here--we don't know which is which.  Below, I have a still version of the plot using a single color:&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##make pitches different colors&lt;/span&gt;  &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;plot.flight(mariano, color="darkred", strikezone=T)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://3.bp.blogspot.com/-vWCGdIpI8Tc/TeUEeICTf4I/AAAAAAAAAWc/N_bslCyWWL0/s1600/RiverasolidStatic.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 400px;" src="http://3.bp.blogspot.com/-vWCGdIpI8Tc/TeUEeICTf4I/AAAAAAAAAWc/N_bslCyWWL0/s400/RiverasolidStatic.png" alt="" id="BLOGGER_PHOTO_ID_5612897426186928002" border="0" /&gt;&lt;/a&gt;However, we can condition color on the pitch type variable in the data set.  We've gone through this in previous tutorials, and it can be done with the following simple code:&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##make pitches different colors&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;plot.flight(mariano, color=mariano$type, strikezone=T)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://2.bp.blogspot.com/-PWE8wHhnDTM/TeUEd42jpZI/AAAAAAAAAWM/D5pUX3Hpjns/s1600/RiveraColorsStatic.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 400px;" src="http://2.bp.blogspot.com/-PWE8wHhnDTM/TeUEd42jpZI/AAAAAAAAAWM/D5pUX3Hpjns/s400/RiveraColorsStatic.png" alt="" id="BLOGGER_PHOTO_ID_5612897422111122834" border="0" /&gt;&lt;/a&gt;And you can follow with the standard keys/legends for each pitch type.  Remember that R colors them with the number of the color in alphabetical order.  So, FC is the Cutter, and is represented by a "1" for colors, which is Black.  That means FF is the red pitch flight shown.  If you're only interested in plotting a single pitch type, simply use the code:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##plot only Mariano Rivera's Cutter&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;marianoFC &amp;lt;- subset(mariano, mariano$type=="FC")&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;plot.flight(marianoFC, color="darkred", strikezone=T)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://4.bp.blogspot.com/-xKzc0U4B3ps/TeUEd7aRJwI/AAAAAAAAAWU/svHDphZxWZ0/s1600/RiveraFCstatic.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 400px;" src="http://4.bp.blogspot.com/-xKzc0U4B3ps/TeUEd7aRJwI/AAAAAAAAAWU/svHDphZxWZ0/s400/RiveraFCstatic.png" alt="" id="BLOGGER_PHOTO_ID_5612897422797776642" border="0" /&gt;&lt;/a&gt;&lt;a href="http://4.bp.blogspot.com/-bMedp1PUWo8/TeUC979ROZI/AAAAAAAAAWE/jl3kMad8C6o/s1600/RiveraFCstatic.png"&gt;&lt;br /&gt;&lt;/a&gt;And, you can leave out the strike zone box by not using the "&lt;span style="color: rgb(153, 0, 0);"&gt;strikezone=&lt;/span&gt;" option in the function (i.e. the default is no strike zone).  But up to now, this isn't getting to the point.  The point of all this is showing an animated version of the pitch flight.  For this, Josh created a nice little "&lt;span style="color: rgb(153, 0, 0);"&gt;for loop&lt;/span&gt;".  For loops are something I'll get to in some advanced plotting and simulation in the sab-R-metrics series, but essentially what it does is creates a plot for each of the 18 frames of the pitch.  When we put these together in a GIF, it comes out as animation (just like a flip-book cartoon).  In Josh's original script, this is the "&lt;span style="color: rgb(153, 0, 0);"&gt;savMovie()&lt;/span&gt;" function.  For this, you'll need to download a program called &lt;a href="http://www.imagemagick.org/script/index.php"&gt;Image Magick&lt;/a&gt;.  You can download it at the link.  This allows us to write GIF files from R using this function.  Go ahead and do that now.&lt;br /&gt;&lt;br /&gt;Okay, so now we're ready to create a Mariano Rivera movie.  For this, we'll use the code from the "&lt;span style="color: rgb(153, 0, 0);"&gt;saveMovie()&lt;/span&gt;" function with a for loop to indicate each frame for each time interval in our data frame.  Here is an example following directly from the code above:&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;####now use the saveMovie stuff&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;# Create gif&lt;/span&gt;  &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;saveMovie( {&lt;br /&gt;  for(i in unique(mariano$time)) {&lt;br /&gt;  plot.flight(mariano[mariano$time==i,], col=c(1,2), strikezone=T)&lt;br /&gt;  text(3.7, 8, "Cutter", col=1, cex=1.3)&lt;br /&gt;  text(3.7, 7.6, "Four Seam", col=2, cex=1.3)&lt;br /&gt;  }&lt;br /&gt;},&lt;br /&gt;movie.name='mariano.gif', interval=.5)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://4.bp.blogspot.com/-OBjk_Tnb-08/TeUHnBwIT9I/AAAAAAAAAWk/ROQvCIdXnII/s1600/mariano.gif"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 400px;" src="http://4.bp.blogspot.com/-OBjk_Tnb-08/TeUHnBwIT9I/AAAAAAAAAWk/ROQvCIdXnII/s400/mariano.gif" alt="" id="BLOGGER_PHOTO_ID_5612900877653790674" border="0" /&gt;&lt;/a&gt;The GIF should open up in Image Magick automatically.  Once there, you can right click on the GIF and save it to the place you want. (&lt;span style="font-weight: bold; font-style: italic;"&gt;Note that you need to click on the GIF image above for it to be animated&lt;/span&gt;)&lt;br /&gt;&lt;br /&gt;For fun, Josh also provided some code for A.J. Burnett.  I have the code and the animation below comparing his knuckle-curve with his four-seam fastball (remember, these are Gameday pitch types).&lt;br /&gt;&lt;a href="http://3.bp.blogspot.com/-DlRCcG7ABVw/TeUHnRfqrUI/AAAAAAAAAWs/aF4Im-IlEFQ/s1600/burnett.gif"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 400px;" src="http://3.bp.blogspot.com/-DlRCcG7ABVw/TeUHnRfqrUI/AAAAAAAAAWs/aF4Im-IlEFQ/s400/burnett.gif" alt="" id="BLOGGER_PHOTO_ID_5612900881879706946" border="0" /&gt;&lt;/a&gt;&lt;span style="font-style: italic;"&gt;Note that for some reason the ball size is reversed in the animations.  I'm not sure why this happens, but I'm trying to work it out.  I've been toying around with Josh's original scaling of the perception of the ball size, and I seem to have messed something up when it goes into the create GIF mode.&lt;/span&gt;  &lt;span style="font-style: italic;"&gt;Optimally, I think the ball should get larger as it nears the plate, rather than smaller.  &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Thanks to Josh&lt;/span&gt; for providing this and posting it up here.  This is some great work and hopefully others out there can put this function to good use!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-2021103621747414416?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/2021103621747414416/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/05/sab-r-metrics-gif-movies-and-pitch.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/2021103621747414416'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/2021103621747414416'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/05/sab-r-metrics-gif-movies-and-pitch.html' title='sab-R-metrics: GIF Movies and Pitch Flights (Guest Post)'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-vWCGdIpI8Tc/TeUEeICTf4I/AAAAAAAAAWc/N_bslCyWWL0/s72-c/RiverasolidStatic.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-7655058546970147527</id><published>2011-05-30T11:54:00.009-04:00</published><updated>2011-05-30T12:25:04.699-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='NCAA'/><category scheme='http://www.blogger.com/atom/ns#' term='College'/><category scheme='http://www.blogger.com/atom/ns#' term='Baseball'/><title type='text'>Mythical Legends of a College Athlete</title><content type='html'>Chubby Johnson is a star outfielder for the Northwest Michigan State Tech Rhinoceroses, a not-so-well known Division III baseball team that I made up.  Johnson has a chance to get drafted in the upcoming MLB draft, but has decided baseball just isn't for him.  So back in March, Chubby went through his career center to find a job at a large financial firm.&lt;br /&gt;&lt;br /&gt;Chubby majored in psychology with a 3.3 GPA.  He's a solid but not great student.  Chubby has always wanted to make lots of money on Wall Street and drive a BMW, but the partying and women just didn't allow him to do math homework every night.  That's why he majored in Psychology.&lt;br /&gt;&lt;br /&gt;His career center encouraged him to use the alumni network to find NMST grads that may work in the field out on Wall Street.  He found a previous graduate named John Scomb.  Scomb, it turns out, is one of the biggest NMST baseball fans that ever lived.  Despite his mediocre GPA and psychology major, Scomb invites Chubby out to New York to be wined, dined and interviewed alongside a number of Harvard Business School grads and MIT Finance majors.  Chubby graciously accepts.&lt;br /&gt;&lt;br /&gt;Chubby's plane ticket arrives in April and the interview comes in the midst of the conference tournament.  Luckily, Chubby is able to work around this and flies out to New York after NMST wins the conference tourney and a birth to regionals.  As would be expected, the flight is on the financial company's dollar.&lt;br /&gt;&lt;br /&gt;When he gets there, Chubby is greeted by a driver that will take him to a nice hotel in Manhattan.  Around noon, Scomb shows up with his boss--Chuck McCourt--and they take Chubby out to lunch at a swanky restaurant.  Scomb pays for the meal.  During the lunch, they mostly talk baseball and ask about NMST's chances of winning it all this year.&lt;br /&gt;&lt;br /&gt;After lunch, Chubby is taken to a Yankees game--his favorite team--in the company luxury suite.  He gets a back massage and has a couple beers, then heads back to the hotel to take a shower before going out to the bar that night.&lt;br /&gt;&lt;br /&gt;The next day, Chubby is hungover.  He can't think straight, but he has to make it to the interview on time.  The driver is waiting for him and takes him to the 9 am interview.  He walks in ready to be grilled by Scomb and his boss.  But only Scomb is there.  He asks Chubby why he wants to work in the financial industry after majoring in Psychology.  Chubby tells him he's always loved the idea of 'being on Wall Street'.  They talk a little more baseball, Chubby does some math problems, and Scomb tells him he's hired.  He's always wanted a fellow NMST-er there with him at the company.&lt;br /&gt;&lt;br /&gt;Chubby goes back to NMST to play out the season, and tells his coach all about how he won't be entering the draft, but was offered a job on Wall Street by Scombs.  The coach knew Scombs, as he had tried out for the team but been cut his freshman year at NMST.&lt;br /&gt;&lt;br /&gt;The Rhinoceroses win the NCAA Division III National Championship in Appleton, Wisconsin.  Chubby graduates and moves to New York to begin his new life as a broker--much in thanks to being a standout baseball player at NMST.  Ever since that year, NMST has proudly displayed the trophy and a picture of Chubby Johnson as the tournament MVP in their brand new gym--mostly paid for by the huge influx of applications after students heard about the great baseball atmosphere there.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Should the baseball coach at NMST resign? &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;After all, a job on Wall Street is worth a lot more than a free tattoo at some random tattoo parlor in Columbus, Ohio.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-7655058546970147527?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/7655058546970147527/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/05/mythical-legends-of-college-athlete.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/7655058546970147527'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/7655058546970147527'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/05/mythical-legends-of-college-athlete.html' title='Mythical Legends of a College Athlete'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-8596514209210751406</id><published>2011-05-25T10:25:00.023-04:00</published><updated>2011-05-25T12:36:38.757-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='R-project'/><category scheme='http://www.blogger.com/atom/ns#' term='Statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='Baseball'/><category scheme='http://www.blogger.com/atom/ns#' term='Kernel Smoothing'/><category scheme='http://www.blogger.com/atom/ns#' term='sab-R-metrics'/><category scheme='http://www.blogger.com/atom/ns#' term='Pitch F/X'/><title type='text'>sab-R-metrics: Kernel Density Smoothing</title><content type='html'>Last time I left you, I had gone over some basics of doing loess regression in R.  If you remember, loess is a sort of regression that allows wigglyness in your regression of some dependent variable Y on some independent variable X (I will generalize this to more than one dimension later on).  However, sometimes we're not always interested in how X affects Y.  Sometimes we may simply be interested in the distribution and frequencies of some value of X.  An example that comes to mind is Pitch F/X data, where we want to see the frequency of pitch locations in the strike zone.&lt;br /&gt;&lt;br /&gt;For showing pitch location, we're going to need two dimensions.  Today, I'm going to begin with a single dimension of kernel density smoothing. It's important to understand what this method is doing before jumping into displaying it in these multiple dimensions like the Pitch F/X heat maps you see everywhere nowadays.&lt;br /&gt;&lt;br /&gt;For this tutorial, I want you to first go ahead and grab the Jeremy Guthrie pitch data from last time.  &lt;a href="http://sitemaker.umich.edu/millsbrian/files/guthrie.csv"&gt;You can download it directly here&lt;/a&gt;.  We'll look to smooth the starting velocity of pitches for Guthrie from the Pitch F/X data.&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##set working directory and load data&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;and take a look at it&lt;br /&gt;&lt;br /&gt;setwd("c:/Users/Millsy/Dropbox/Blog Stuff/sab-R-metrics")&lt;/span&gt;  &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;guth &amp;lt;- read.csv(file="guthrie.csv", h=T)&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;head(guth)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Okay, now that the data is loaded in, let's start with a histogram.  Why?  Well, density smoothing and histograms are very much related.  Kernel density smoothing is really just a smooth, pretty version of a histogram.  We'll start simple with looking at the distribution of speeds for all pitches, then get a little more advanced and overlay the velocity of each pitch type on top of one another.  For this exercise, we'll have to look back to a &lt;a href="http://princeofslides.blogspot.com/2011/01/sab-r-metrics-intermediate-boxplots-and.html"&gt;previous post&lt;/a&gt; and use the "&lt;span style="color: rgb(153, 0, 0);"&gt;hist()&lt;/span&gt;" function in R.  This will produce a nice histogram with the defaults.  Go ahead and try it yourself.  Practice using the RGB scheme and making transparent colors if you want.  The histogram function has a bunch of options, but I'll keep things basic for now.  Don't forget to give your plot a title and label the axes so we know what we're plotting.  Finally, be sure to tell R that you want to plot the density with the command "&lt;span style="color: rgb(153, 0, 0);"&gt;freq=FALSE&lt;/span&gt;"&lt;span style="color: rgb(153, 0, 0);"&gt;&lt;/span&gt; within the histogram function.  We'll see why a little later.  If you get stuck, you can reference the code below:&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##create histogram of Guthrie pitch speeds&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;hist(guth$start_speed, xlab="Speed Out of Hand (MPH)", main="Jeremy Guthrie Pitch Speed Distribution", freq=FALSE, col="#99550070")&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://1.bp.blogspot.com/-SLBDIzXarXc/Td0seFHB4XI/AAAAAAAAAU0/gLOwt8qpdKA/s1600/GuthHist1.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 400px;" src="http://1.bp.blogspot.com/-SLBDIzXarXc/Td0seFHB4XI/AAAAAAAAAU0/gLOwt8qpdKA/s400/GuthHist1.png" alt="" id="BLOGGER_PHOTO_ID_5610689606052864370" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Okay, this looks pretty much like what we had in the previous tutorial.  The range of pitch speeds included is strange on the left of the plot (we'll fix this later, it's likely just PFX data issues on 1 or 2 data points).  But what does this have to do with kernel density smoothing?  Well, density smoothing is another way of representing this distribution.  If we don't really like putting things in "bins" in a histogram, we can use a smoother to look at the distribution.  There are some real advantages to this, as we will see in a few paragraphs.  I'll provide some very brief background on what the kernel smoother actually does.&lt;br /&gt;&lt;br /&gt;In a histogram, we use bins with a given bandwidth to group together observations and get a rough estimate at the probability density function (PDF...&lt;span style="font-style: italic;"&gt;not the Adobe kind&lt;/span&gt;) of our data.  We can affect the shape by changing the bandwidth and number of bins if we'd like.  But the discrete display isn't always optimal.  The kernel smoother allows us to take a weighted average at each observation.  The points that are closer to each observation will be weighted more heavily in this average, while those far away are weighted less.  This allows us to get a smooth representation of the density (much like the loess does when we have a dependent variable of interest--though loess is a &lt;span style="font-style: italic;"&gt;variable bandwidth&lt;/span&gt; smoother).  There are a number of ways to weight your density, but there are some standard "kernels" that are used in practice.  The most popular are the Gaussian and the Epanechnikov kernels.  Others include the Triweight, Biweight, Uniform, and so on.  Really you can weight it however you want, but the Gaussian and Epanechnikov should do the trick for purposes here.&lt;br /&gt;&lt;br /&gt;So, for the Gaussian kernel, picture the normal distribution.  In kernel smoothing, we use the height of the standard normal to weight all the observations within the vicinity of each observation in our data set.  So, for a 90 mph pitch, those that are 91 mph and 89 mph are likely weighted heavily in deciding the height of our smoothed representation.  As you get further from the middle of the normal, you notice that the tails get shorter and shorter.  Since the height of the curve is the weight, those observations further and further from our point of interest will be weighted less and less based on these tails.  Moving this along each value in our distribution of pitch speeds (i.e. this is done multiple times for each value in the range of the data), we get a weighted, smoothed height of our density curve.&lt;br /&gt;&lt;br /&gt;The default in R is the Gaussian kernel, but you can specify what you want by using the "&lt;span style="color: rgb(153, 0, 0);"&gt;kernel=&lt;/span&gt;" option and just typing the name of your desired kernel (i.e. "gaussian" or "epanechnikov").  Let's apply this using the "&lt;span style="color: rgb(153, 0, 0);"&gt;density()&lt;/span&gt;" function in R and just using the defaults for the kernel.&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##estimate pitch speed density and plot (be sure to tell R to ignore missing values!)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;pdens &amp;lt;- density(guth$start_speed, na.rm=T)&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;pdens1 &amp;lt;- density(guth$start_speed, bw=.1, na.rm=T)&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;pdens5 &amp;lt;- density(guth$start_speed, bw=.5, na.rm=T)&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;pdens40 &amp;lt;- density(guth$start_speed, bw=4, na.rm=T)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;par(mfrow=c(2,1))&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;plot(pdens, col="black", lwd=3, xlab="Speed Out of Hand (MPH)", main="Default KDE")&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;plot(pdens1, col="red", lwd=2, xlab="Speed Out of Hand (MPH)", main="Kernel Density Estimation")&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;lines(pdens5, col="green", lwd=2)&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;lines(pdens40, col="gold", lwd=2)&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;lines(pdens, col="black", lwd=2)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;text(50, 0.15, "bw = 0.1", col="red", cex=1.3)&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;text(50, 0.14, "bw = 0.5", col="green", cex=1.3)&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;text(50, 0.13, "bw = 4", col="gold", cex=1.3)&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;text(50, 0.12, "bw = nrd0", col="black", cex=1.3)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://4.bp.blogspot.com/-V0gv4n2-2Dw/Td0seUnz1kI/AAAAAAAAAU8/IQ12602S73Q/s1600/GuthDens1.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 260px; height: 400px;" src="http://4.bp.blogspot.com/-V0gv4n2-2Dw/Td0seUnz1kI/AAAAAAAAAU8/IQ12602S73Q/s400/GuthDens1.png" alt="" id="BLOGGER_PHOTO_ID_5610689610216887874" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;In first part of the code above, I simply let R calculate the bandwidth using a rule of thumb (called "nrd0" in R in the first plot...I won't get too much into optimizing bandwidth, but &lt;a href="http://princeofslides.blogspot.com/2011/05/sab-r-metrics-basics-of-loess.html"&gt;use the same logic that we did with loess smoothing&lt;/a&gt;).  Like with the loess regression, you may want to play with different bandwidths and see how this affects the smoothing and look of the distribution.  You can adjust this using the option "&lt;span style="color: rgb(153, 0, 0);"&gt;bw=&lt;/span&gt;" within the "&lt;span style="color: rgb(153, 0, 0);"&gt;density()&lt;/span&gt;" function.  I've plotted multiple lines with different bandwidths to illustrate above in the bottom plot.&lt;br /&gt;&lt;br /&gt;You can see that the optimized bandwidth (the default) seems to smooth nicely.  On the other hand, the red line (a small bandwidth) is much too wiggly, while the yellow line (very large bandwidth) tells us that Guthrie gets the ball up over 100 mph fairly often.  Also, we see there seem to be some outliers (likely Pitch F/X errors).  We can fix this up a bit by using some additional options in the "&lt;span style="color: rgb(153, 0, 0);"&gt;density()&lt;/span&gt;" function in R.  By telling R to smooth "&lt;span style="color: rgb(153, 0, 0);"&gt;from=&lt;/span&gt;" and "&lt;span style="color: rgb(153, 0, 0);"&gt;to=&lt;/span&gt;", we ensure that we're only smoothing over the real range of the data and can ignore those weird few points on the left.  In addition to this, we can ensure that there are fewer issues at the edges of the data (i.e. smoothing up past 100 mph--we had similar issues with loess smoothing if you remember the widening of the confidence intervals at the edges of the data range) using the option "&lt;span style="color: rgb(153, 0, 0);"&gt;cut=&lt;/span&gt;", but I won't cover this option today.  I encourage you to fiddle with it on your own.  Let's try limiting the range of the smoothing for now:&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;###limit smoothing range to get rid of crappy FX velocity data&lt;/span&gt;  &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;pdensB &amp;lt;- density(guth$start_speed, na.rm=T, from=65, to=100)&lt;/span&gt;  &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;plot(pdensB, col="black", lwd=3, xlab="Speed Out of Hand (MPH)", main="Kernel Density Estimation")&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://4.bp.blogspot.com/-hAlDZgwDZxI/Td0sevBGP5I/AAAAAAAAAVE/2Pg8DiJWuIY/s1600/GuthDens2.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 333px;" src="http://4.bp.blogspot.com/-hAlDZgwDZxI/Td0sevBGP5I/AAAAAAAAAVE/2Pg8DiJWuIY/s400/GuthDens2.png" alt="" id="BLOGGER_PHOTO_ID_5610689617302273938" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Now this is a little bit easier to look at.  Now, to see how this relates to our histogram, let's overlay the data on the histogram.  Call the histogram, and then we'll add the above density curve using "&lt;span style="color: rgb(153, 0, 0);"&gt;lines()&lt;/span&gt;".  Note that we'll have to set the y-axis limits to ensure that the entire smooth shows up on our plot (the range of density is larger for the smooth than the default histogram) and reset the x-limits to keep the histogram from including all the useless left-tail pitch speeds.&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;###add density to original histogram&lt;/span&gt;  &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;hist(guth$start_speed, xlab="Speed Out of Hand (MPH)", main="Jeremy Guthrie Pitch Speed Distribution", freq=FALSE, col="#99550070", ylim=c(0,0.13), xlim=c(65, 100))&lt;/span&gt;  &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;lines(pdensB, col="black", lwd=3)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://4.bp.blogspot.com/-yf0nmjOQ0YQ/Td0se2pJd6I/AAAAAAAAAVM/OJ664oO9Hjw/s1600/GuthDensHist.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 333px;" src="http://4.bp.blogspot.com/-yf0nmjOQ0YQ/Td0se2pJd6I/AAAAAAAAAVM/OJ664oO9Hjw/s400/GuthDensHist.png" alt="" id="BLOGGER_PHOTO_ID_5610689619349305250" border="0" /&gt;&lt;/a&gt;Notice how the smoother allowed us to see the tri-modal distribution of pitch speeds, while the histogram may not have provided us with enough bins.  This seems to allude to the idea that Guthrie is throwing 3 (or 4) different pitch types that have different speeds.  Let's see what these are by estimating the density by pitch type.  I'll group fastball types together, and curveballs, change-ups, and sliders each separately.  We can plot them on the same plot and see how they compare.&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;###now separate by pitch type&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;pdens.F &amp;lt;- density(guth$start_speed[guth$pitch_type=="FA" | guth$pitch_type=="FF" | guth$pitch_type=="FC" |&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;    guth$pitch_type=="FT" | guth$pitch_type=="SI"], from=65, to=100)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;pdens.CU &amp;lt;- density(guth$start_speed[guth$pitch_type=="CU"], from=65, to=100)&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;pdens.SL &amp;lt;- density(guth$start_speed[guth$pitch_type=="SL"], from=65, to=100)&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;pdens.CH &amp;lt;- density(guth$start_speed[guth$pitch_type=="CH"], from=65, to=100)&lt;/span&gt;  &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;plot(pdens.F, col="black", lwd=2, xlab="Speed Out of Hand (MPH)", main="Pitch Speed Distribution by Pitch Type", ylim=c(0,.25))&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;lines(pdens.CU, col="darkred", lwd=2)&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;lines(pdens.SL, col="darkgreen", lwd=2)&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;lines(pdens.CH, col="darkblue", lwd=2)&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;text(67, 0.25, "Fastballs", col="black")&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;text(67, 0.24, "Curveballs", col="darkred")&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;text(67, 0.23, "Sliders", col="darkgreen")&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;text(67, 0.22, "Change-Ups", col="darkblue")&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://4.bp.blogspot.com/-w55y5rxpTJg/Td0se1UBO4I/AAAAAAAAAVU/3xSbfipCIvw/s1600/GuthDensType.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 333px;" src="http://4.bp.blogspot.com/-w55y5rxpTJg/Td0se1UBO4I/AAAAAAAAAVU/3xSbfipCIvw/s400/GuthDensType.png" alt="" id="BLOGGER_PHOTO_ID_5610689618992249730" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;So, given what we see in the plot above, it looks like Gameday may have mis-classified some Sliders as Curveballs.  Guthrie's Change and Slider seem to be near the same velocity, while--of course--the fastest pitches are the fastball variants classified here.  It also looks like there may be one or two Curves and Sliders given the tiny bump on the right for these pitches.  Of course, this isn't cluster analysis (though, this type of comparison is sort of what clustering does, and I'll go over formal cluster analysis later in the series...promise!).  However, we may be able to discern some things by running this same sort of comparison for movement variables included in the Pitch F/X data.  I'll leave that up to the reader to fiddle around with as a learning exercise.&lt;br /&gt;&lt;br /&gt;Now that we've covered kernel density estimation in a single dimension, we can move on to covering this in two dimensions.  Estimating it is quite easy in R, but displaying the data is the difficult part.  We'll need a third dimension to display data.  This can be done using color or using a 3-dimensional looking plot.  I prefer color, as it creates the really neat heat maps that we've seen around the net.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;And...Pretty R Code this time!&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="overflow: auto;"&gt;&lt;div class="geshifilter"&gt;&lt;pre class="r geshifilter-R" style="font-family: monospace;"&gt;&lt;span style="color: rgb(102, 102, 102); font-style: italic;"&gt;#############################&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(102, 102, 102); font-style: italic;"&gt;################Kernel Density Estimation and Plotting (single dimension)&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(102, 102, 102); font-style: italic;"&gt;#############################&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(102, 102, 102); font-style: italic;"&gt;##set working directory and load data&lt;/span&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/base/setwd"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;setwd&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;&lt;span style="color: rgb(0, 0, 255);"&gt;"c:/Users/Millsy/Dropbox/Blog Stuff/sab-R-metrics"&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;guth &lt;span style=""&gt;&amp;lt;-&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/utils/read.csv"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;read.csv&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;&lt;a href="http://inside-r.org/r-doc/base/file"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;file&lt;/span&gt;&lt;/a&gt;=&lt;span style="color: rgb(0, 0, 255);"&gt;"guthrie.csv"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; h=T&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/utils/head"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;head&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;guth&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(102, 102, 102); font-style: italic;"&gt;###make histogram of Guthrie pitch speed&lt;/span&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/grDevices/png"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;png&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;&lt;a href="http://inside-r.org/r-doc/base/file"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;file&lt;/span&gt;&lt;/a&gt;=&lt;span style="color: rgb(0, 0, 255);"&gt;"GuthHist1.png"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; height=&lt;span style="color: rgb(204, 102, 204);"&gt;500&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; width=&lt;span style="color: rgb(204, 102, 204);"&gt;500&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/graphics/hist"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;hist&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;guth&lt;span style=""&gt;$&lt;/span&gt;start_speed&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; xlab=&lt;span style="color: rgb(0, 0, 255);"&gt;"Speed Out of Hand (MPH)"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; main=&lt;span style="color: rgb(0, 0, 255);"&gt;"Jeremy Guthrie Pitch Speed Distribution"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt;&lt;br /&gt; freq=&lt;span style="color: rgb(0, 0, 0); font-weight: bold;"&gt;FALSE&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/col"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;col&lt;/span&gt;&lt;/a&gt;=&lt;span style="color: rgb(0, 0, 255);"&gt;"#99550070"&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/grDevices/dev.off"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;dev.off&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(102, 102, 102); font-style: italic;"&gt;##estimate pitch speed density (be sure to tell R to ignore missing values!)&lt;/span&gt;&lt;br /&gt;pdens &lt;span style=""&gt;&amp;lt;-&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/stats/density"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;density&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;guth&lt;span style=""&gt;$&lt;/span&gt;start_speed&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; na.rm=T&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;pdens1 &lt;span style=""&gt;&amp;lt;-&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/stats/density"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;density&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;guth&lt;span style=""&gt;$&lt;/span&gt;start_speed&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; bw=&lt;span style="color: rgb(204, 102, 204);"&gt;.1&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; na.rm=T&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;pdens5 &lt;span style=""&gt;&amp;lt;-&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/stats/density"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;density&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;guth&lt;span style=""&gt;$&lt;/span&gt;start_speed&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; bw=&lt;span style="color: rgb(204, 102, 204);"&gt;.5&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; na.rm=T&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;pdens40 &lt;span style=""&gt;&amp;lt;-&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/stats/density"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;density&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;guth&lt;span style=""&gt;$&lt;/span&gt;start_speed&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; bw=&lt;span style="color: rgb(204, 102, 204);"&gt;4&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; na.rm=T&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/grDevices/png"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;png&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;&lt;a href="http://inside-r.org/r-doc/base/file"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;file&lt;/span&gt;&lt;/a&gt;=&lt;span style="color: rgb(0, 0, 255);"&gt;"GuthDens1.png"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; height=&lt;span style="color: rgb(204, 102, 204);"&gt;1000&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; width=&lt;span style="color: rgb(204, 102, 204);"&gt;650&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/graphics/par"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;par&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;mfrow=&lt;a href="http://inside-r.org/r-doc/base/c"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;c&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;&lt;span style="color: rgb(204, 102, 204);"&gt;2&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt;&lt;span style="color: rgb(204, 102, 204);"&gt;1&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/graphics/plot"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;plot&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;pdens&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/col"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;col&lt;/span&gt;&lt;/a&gt;=&lt;span style="color: rgb(0, 0, 255);"&gt;"black"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; lwd=&lt;span style="color: rgb(204, 102, 204);"&gt;3&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; xlab=&lt;span style="color: rgb(0, 0, 255);"&gt;"Speed Out of Hand (MPH)"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; main=&lt;span style="color: rgb(0, 0, 255);"&gt;"Default KDE"&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/graphics/plot"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;plot&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;pdens1&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/col"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;col&lt;/span&gt;&lt;/a&gt;=&lt;span style="color: rgb(0, 0, 255);"&gt;"red"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; lwd=&lt;span style="color: rgb(204, 102, 204);"&gt;2&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; xlab=&lt;span style="color: rgb(0, 0, 255);"&gt;"Speed Out of Hand (MPH)"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; main=&lt;span style="color: rgb(0, 0, 255);"&gt;"Kernel Density Estimation"&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/graphics/lines"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;lines&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;pdens5&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/col"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;col&lt;/span&gt;&lt;/a&gt;=&lt;span style="color: rgb(0, 0, 255);"&gt;"green"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; lwd=&lt;span style="color: rgb(204, 102, 204);"&gt;2&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/graphics/lines"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;lines&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;pdens40&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/col"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;col&lt;/span&gt;&lt;/a&gt;=&lt;span style="color: rgb(0, 0, 255);"&gt;"gold"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; lwd=&lt;span style="color: rgb(204, 102, 204);"&gt;2&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/graphics/lines"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;lines&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;pdens&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/col"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;col&lt;/span&gt;&lt;/a&gt;=&lt;span style="color: rgb(0, 0, 255);"&gt;"black"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; lwd=&lt;span style="color: rgb(204, 102, 204);"&gt;2&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/graphics/text"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;text&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;&lt;span style="color: rgb(204, 102, 204);"&gt;50&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;span style="color: rgb(204, 102, 204);"&gt;0.15&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;span style="color: rgb(0, 0, 255);"&gt;"bw = 0.1"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/col"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;col&lt;/span&gt;&lt;/a&gt;=&lt;span style="color: rgb(0, 0, 255);"&gt;"red"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; cex=&lt;span style="color: rgb(204, 102, 204);"&gt;1.3&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/graphics/text"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;text&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;&lt;span style="color: rgb(204, 102, 204);"&gt;50&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;span style="color: rgb(204, 102, 204);"&gt;0.14&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;span style="color: rgb(0, 0, 255);"&gt;"bw = 0.5"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/col"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;col&lt;/span&gt;&lt;/a&gt;=&lt;span style="color: rgb(0, 0, 255);"&gt;"green"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; cex=&lt;span style="color: rgb(204, 102, 204);"&gt;1.3&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/graphics/text"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;text&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;&lt;span style="color: rgb(204, 102, 204);"&gt;50&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;span style="color: rgb(204, 102, 204);"&gt;0.13&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;span style="color: rgb(0, 0, 255);"&gt;"bw = 4"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/col"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;col&lt;/span&gt;&lt;/a&gt;=&lt;span style="color: rgb(0, 0, 255);"&gt;"gold"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; cex=&lt;span style="color: rgb(204, 102, 204);"&gt;1.3&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/graphics/text"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;text&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;&lt;span style="color: rgb(204, 102, 204);"&gt;50&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;span style="color: rgb(204, 102, 204);"&gt;0.12&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;span style="color: rgb(0, 0, 255);"&gt;"bw = nrd0"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/col"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;col&lt;/span&gt;&lt;/a&gt;=&lt;span style="color: rgb(0, 0, 255);"&gt;"black"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; cex=&lt;span style="color: rgb(204, 102, 204);"&gt;1.3&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/grDevices/dev.off"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;dev.off&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(102, 102, 102); font-style: italic;"&gt;###limit range of smoothing&lt;/span&gt;&lt;br /&gt;pdensB &lt;span style=""&gt;&amp;lt;-&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/stats/density"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;density&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;guth&lt;span style=""&gt;$&lt;/span&gt;start_speed&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; na.rm=T&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; from=&lt;span style="color: rgb(204, 102, 204);"&gt;65&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; to=&lt;span style="color: rgb(204, 102, 204);"&gt;100&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/grDevices/png"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;png&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;&lt;a href="http://inside-r.org/r-doc/base/file"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;file&lt;/span&gt;&lt;/a&gt;=&lt;span style="color: rgb(0, 0, 255);"&gt;"GuthDens2.png"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; height=&lt;span style="color: rgb(204, 102, 204);"&gt;500&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; width=&lt;span style="color: rgb(204, 102, 204);"&gt;600&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/graphics/plot"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;plot&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;pdensB&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/col"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;col&lt;/span&gt;&lt;/a&gt;=&lt;span style="color: rgb(0, 0, 255);"&gt;"black"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; lwd=&lt;span style="color: rgb(204, 102, 204);"&gt;3&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; xlab=&lt;span style="color: rgb(0, 0, 255);"&gt;"Speed Out of Hand (MPH)"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; main=&lt;span style="color: rgb(0, 0, 255);"&gt;"Kernel Density Estimation"&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/grDevices/dev.off"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;dev.off&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(102, 102, 102); font-style: italic;"&gt;###add density to original histogram&lt;/span&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/grDevices/png"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;png&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;&lt;a href="http://inside-r.org/r-doc/base/file"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;file&lt;/span&gt;&lt;/a&gt;=&lt;span style="color: rgb(0, 0, 255);"&gt;"GuthDensHist.png"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; height=&lt;span style="color: rgb(204, 102, 204);"&gt;500&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; width=&lt;span style="color: rgb(204, 102, 204);"&gt;600&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/graphics/hist"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;hist&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;guth&lt;span style=""&gt;$&lt;/span&gt;start_speed&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; xlab=&lt;span style="color: rgb(0, 0, 255);"&gt;"Speed Out of Hand (MPH)"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; main=&lt;span style="color: rgb(0, 0, 255);"&gt;"Jeremy Guthrie Pitch Speed Distribution"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt;&lt;br /&gt; freq=&lt;span style="color: rgb(0, 0, 0); font-weight: bold;"&gt;FALSE&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/col"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;col&lt;/span&gt;&lt;/a&gt;=&lt;span style="color: rgb(0, 0, 255);"&gt;"#99550070"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; ylim=&lt;a href="http://inside-r.org/r-doc/base/c"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;c&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;&lt;span style="color: rgb(204, 102, 204);"&gt;0&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt;&lt;span style="color: rgb(204, 102, 204);"&gt;0.13&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; xlim=&lt;a href="http://inside-r.org/r-doc/base/c"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;c&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;&lt;span style="color: rgb(204, 102, 204);"&gt;65&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;span style="color: rgb(204, 102, 204);"&gt;100&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/graphics/lines"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;lines&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;pdensB&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/col"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;col&lt;/span&gt;&lt;/a&gt;=&lt;span style="color: rgb(0, 0, 255);"&gt;"black"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; lwd=&lt;span style="color: rgb(204, 102, 204);"&gt;3&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/grDevices/dev.off"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;dev.off&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(102, 102, 102); font-style: italic;"&gt;###now separate by pitch type&lt;/span&gt;&lt;br /&gt;pdens.F &lt;span style=""&gt;&amp;lt;-&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/stats/density"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;density&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;guth&lt;span style=""&gt;$&lt;/span&gt;start_speed&lt;span style="color: rgb(0, 153, 0);"&gt;[&lt;/span&gt;guth&lt;span style=""&gt;$&lt;/span&gt;pitch_type&lt;span style=""&gt;==&lt;/span&gt;&lt;span style="color: rgb(0, 0, 255);"&gt;"FA"&lt;/span&gt; &lt;span style=""&gt;|&lt;/span&gt; guth&lt;span style=""&gt;$&lt;/span&gt;pitch_type&lt;span style=""&gt;==&lt;/span&gt;&lt;span style="color: rgb(0, 0, 255);"&gt;"FF"&lt;/span&gt; &lt;span style=""&gt;|&lt;/span&gt; guth&lt;span style=""&gt;$&lt;/span&gt;pitch_type&lt;span style=""&gt;==&lt;/span&gt;&lt;span style="color: rgb(0, 0, 255);"&gt;"FC"&lt;/span&gt; &lt;span style=""&gt;|&lt;/span&gt;&lt;br /&gt; guth&lt;span style=""&gt;$&lt;/span&gt;pitch_type&lt;span style=""&gt;==&lt;/span&gt;&lt;span style="color: rgb(0, 0, 255);"&gt;"FT"&lt;/span&gt; &lt;span style=""&gt;|&lt;/span&gt; guth&lt;span style=""&gt;$&lt;/span&gt;pitch_type&lt;span style=""&gt;==&lt;/span&gt;&lt;span style="color: rgb(0, 0, 255);"&gt;"SI"&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;]&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; from=&lt;span style="color: rgb(204, 102, 204);"&gt;65&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; to=&lt;span style="color: rgb(204, 102, 204);"&gt;100&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;pdens.CU &lt;span style=""&gt;&amp;lt;-&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/stats/density"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;density&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;guth&lt;span style=""&gt;$&lt;/span&gt;start_speed&lt;span style="color: rgb(0, 153, 0);"&gt;[&lt;/span&gt;guth&lt;span style=""&gt;$&lt;/span&gt;pitch_type&lt;span style=""&gt;==&lt;/span&gt;&lt;span style="color: rgb(0, 0, 255);"&gt;"CU"&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;]&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; from=&lt;span style="color: rgb(204, 102, 204);"&gt;65&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; to=&lt;span style="color: rgb(204, 102, 204);"&gt;100&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;pdens.SL &lt;span style=""&gt;&amp;lt;-&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/stats/density"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;density&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;guth&lt;span style=""&gt;$&lt;/span&gt;start_speed&lt;span style="color: rgb(0, 153, 0);"&gt;[&lt;/span&gt;guth&lt;span style=""&gt;$&lt;/span&gt;pitch_type&lt;span style=""&gt;==&lt;/span&gt;&lt;span style="color: rgb(0, 0, 255);"&gt;"SL"&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;]&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; from=&lt;span style="color: rgb(204, 102, 204);"&gt;65&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; to=&lt;span style="color: rgb(204, 102, 204);"&gt;100&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;pdens.CH &lt;span style=""&gt;&amp;lt;-&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/stats/density"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;density&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;guth&lt;span style=""&gt;$&lt;/span&gt;start_speed&lt;span style="color: rgb(0, 153, 0);"&gt;[&lt;/span&gt;guth&lt;span style=""&gt;$&lt;/span&gt;pitch_type&lt;span style=""&gt;==&lt;/span&gt;&lt;span style="color: rgb(0, 0, 255);"&gt;"CH"&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;]&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; from=&lt;span style="color: rgb(204, 102, 204);"&gt;65&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; to=&lt;span style="color: rgb(204, 102, 204);"&gt;100&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/grDevices/png"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;png&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;&lt;a href="http://inside-r.org/r-doc/base/file"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;file&lt;/span&gt;&lt;/a&gt;=&lt;span style="color: rgb(0, 0, 255);"&gt;"GuthDensType.png"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; height=&lt;span style="color: rgb(204, 102, 204);"&gt;500&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; width=&lt;span style="color: rgb(204, 102, 204);"&gt;600&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/graphics/plot"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;plot&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;pdens.F&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/col"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;col&lt;/span&gt;&lt;/a&gt;=&lt;span style="color: rgb(0, 0, 255);"&gt;"black"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; lwd=&lt;span style="color: rgb(204, 102, 204);"&gt;2&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; xlab=&lt;span style="color: rgb(0, 0, 255);"&gt;"Speed Out of Hand (MPH)"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt;&lt;br /&gt; main=&lt;span style="color: rgb(0, 0, 255);"&gt;"Pitch Speed Distribution by Pitch Type"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; ylim=&lt;a href="http://inside-r.org/r-doc/base/c"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;c&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;&lt;span style="color: rgb(204, 102, 204);"&gt;0&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt;&lt;span style="color: rgb(204, 102, 204);"&gt;.25&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/graphics/lines"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;lines&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;pdens.CU&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/col"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;col&lt;/span&gt;&lt;/a&gt;=&lt;span style="color: rgb(0, 0, 255);"&gt;"darkred"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; lwd=&lt;span style="color: rgb(204, 102, 204);"&gt;2&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/graphics/lines"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;lines&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;pdens.SL&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/col"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;col&lt;/span&gt;&lt;/a&gt;=&lt;span style="color: rgb(0, 0, 255);"&gt;"darkgreen"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; lwd=&lt;span style="color: rgb(204, 102, 204);"&gt;2&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/graphics/lines"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;lines&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;pdens.CH&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/col"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;col&lt;/span&gt;&lt;/a&gt;=&lt;span style="color: rgb(0, 0, 255);"&gt;"darkblue"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; lwd=&lt;span style="color: rgb(204, 102, 204);"&gt;2&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/graphics/text"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;text&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;&lt;span style="color: rgb(204, 102, 204);"&gt;67&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;span style="color: rgb(204, 102, 204);"&gt;0.25&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;span style="color: rgb(0, 0, 255);"&gt;"Fastballs"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/col"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;col&lt;/span&gt;&lt;/a&gt;=&lt;span style="color: rgb(0, 0, 255);"&gt;"black"&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/graphics/text"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;text&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;&lt;span style="color: rgb(204, 102, 204);"&gt;67&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;span style="color: rgb(204, 102, 204);"&gt;0.24&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;span style="color: rgb(0, 0, 255);"&gt;"Curveballs"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/col"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;col&lt;/span&gt;&lt;/a&gt;=&lt;span style="color: rgb(0, 0, 255);"&gt;"darkred"&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/graphics/text"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;text&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;&lt;span style="color: rgb(204, 102, 204);"&gt;67&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;span style="color: rgb(204, 102, 204);"&gt;0.23&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;span style="color: rgb(0, 0, 255);"&gt;"Sliders"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/col"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;col&lt;/span&gt;&lt;/a&gt;=&lt;span style="color: rgb(0, 0, 255);"&gt;"darkgreen"&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/graphics/text"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;text&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;&lt;span style="color: rgb(204, 102, 204);"&gt;67&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;span style="color: rgb(204, 102, 204);"&gt;0.22&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;span style="color: rgb(0, 0, 255);"&gt;"Change-Ups"&lt;/span&gt;&lt;span style="color: rgb(51, 153, 51);"&gt;,&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/col"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;col&lt;/span&gt;&lt;/a&gt;=&lt;span style="color: rgb(0, 0, 255);"&gt;"darkblue"&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://inside-r.org/r-doc/grDevices/dev.off"&gt;&lt;span style="color: rgb(0, 51, 153); font-weight: bold;"&gt;dev.off&lt;/span&gt;&lt;/a&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;(&lt;/span&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;)&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-8596514209210751406?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/8596514209210751406/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/05/sab-r-metrics-kernel-density-smoothing.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/8596514209210751406'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/8596514209210751406'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/05/sab-r-metrics-kernel-density-smoothing.html' title='sab-R-metrics: Kernel Density Smoothing'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-SLBDIzXarXc/Td0seFHB4XI/AAAAAAAAAU0/gLOwt8qpdKA/s72-c/GuthHist1.png' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-3869237948501766989</id><published>2011-05-11T14:46:00.001-04:00</published><updated>2011-05-11T14:46:20.461-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Loess'/><category scheme='http://www.blogger.com/atom/ns#' term='R-project'/><category scheme='http://www.blogger.com/atom/ns#' term='Statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='Baseball'/><category scheme='http://www.blogger.com/atom/ns#' term='sab-R-metrics'/><title type='text'>sab-R-metrics: Basics of LOESS Regression</title><content type='html'>Last week, I left you off at logistic regression.  This week, I'll be pushing the limits of regression analysis a bit more with a smoothing technique called LOESS regression.  There are a number of smoothing methods that can be used, such as Smoothing Splines or simple Local Linear Regression; however, I'm going to cover LOESS (loess) here because it is very flexible and easy to implement in R.  Remember that here, I'm not going to cover too much of the quantitative portion of the methods.  That means that if you plan on using loess in your own work, you should probably read up on what it is actually doing.  I'll begin with a brief, non-mathematical description.&lt;br /&gt;&lt;br /&gt;When we ran regressions using OLS procedures, there is an assumption that the relationship between the X and Y variable is monotonic and constant across the domain and range of each variable (i.e. that as X increases, Y also increases--at the same rate for all X and Y).  Of course in the real world, this is not always the case.  You can use polynomials in linear regression to address the issue, but sometimes other methods may be necessary.  This is where smoothing comes in.&lt;br /&gt;&lt;br /&gt;Using smoothers, there is no restriction on the functional form between X and Y with respect to intensity of the relationship, or direction (positive or negative).  Of course, this means our fits are a bit more computationally intensive.  And if not careful, it is very easy to overfit the data by trying to include every wiggle we see.  But if done properly, one may be able to glean some extra information from the data by using a smoother instead of a restrictive linear model.&lt;br /&gt;&lt;br /&gt;So what is loess?  Well, as I said there are a number of smoothers out there.  The advantage of loess (with its predecessor 'LOWESS') is that it allows a bit more flexibility than some other smoothers.  The name 'loess' stands for Locally Weighted Least Squares Regression.  So, it uses more local data to estimate our Y variable.  But it is also known as a variable bandwidth smoother, in that it uses a 'nearest neighbors' method to smooth.  If you are interested in the guts of LOESS, a Google search should do you just fine.&lt;br /&gt;&lt;br /&gt;As usual, there is a nice easy function for loess in R.  The first thing you'll need to do is download a new data set from my site called "&lt;span style="color: rgb(153, 0, 0);"&gt;guthrie.csv&lt;/span&gt;".  This is a Pitch F/X data set from Joe Lefkowitz's site including all pitches by Jeremy Guthrie from 2008 through 2011 (as of May 11, 2011).  If you are a Baseball Prospectus reader and ran across &lt;a href="http://www.baseballprospectus.com/article.php?articleid=13877"&gt;Mike Fast's most recent article&lt;/a&gt;, you'll understand why I think this is a nice data set for implementing loess...or at least you will by the end of this tutorial.&lt;br /&gt;&lt;br /&gt;Once you have the data, go ahead and set your working directory and load it in.  I'm naming my initial version of the data "&lt;span style="color: rgb(102, 0, 0);"&gt;guth&lt;/span&gt;":&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;####set working directory and load data&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;setwd("c:/Users/Millsy/Dropbox/Blog Stuff/sab-R-metrics")&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;guth &amp;lt;- read.csv(file="guthrie.csv", h=T)&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;head(guth)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Now because I plan on working with pitch velocity data today, I want to make sure we're including pitches of a certain type.  For this reason, I want to go ahead and subset the data into only fastball variants thrown by Guthrie over this time period.  That way, the data aren't contaminated with change-ups and curveballs of lower velocity.  We want to look specifically at arm strength.  The following code should subset the data correctly.  Remember that the "&lt;span style="color: rgb(153, 0, 0);"&gt;|&lt;/span&gt;" means "OR" in R.&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##subset to just fastballs and fastball variants&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;guthfast &amp;lt;- subset(guth, guth$pitch_type=="FA" | guth$pitch_type=="FF" | guth$pitch_type=="FC" | guth$pitch_type=="FT" | guth$pitch_type=="SI")&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Now, because we want this to be an ordered series of pitches across time, we'll have to create a new variable to represent this sequence.  For this, we'll make use of a new function that comes very much in handy when working with and visualizing smoothing analyses.  It is called "&lt;span style="color: rgb(153, 0, 0);"&gt;seq()&lt;/span&gt;", and it creates a sequence of numbers.  The first command below "&lt;span style="color: rgb(153, 0, 0);"&gt;from=&lt;/span&gt;" indicates the starting point of your sequence, and "&lt;span style="color: rgb(153, 0, 0);"&gt;to=&lt;/span&gt;" represents the endpoint.  Finally, "&lt;span style="color: rgb(153, 0, 0);"&gt;by=&lt;/span&gt;" tells R the space between your points.  You can put any number into these that you want.  The smaller the "&lt;span style="color: rgb(153, 0, 0);"&gt;by=&lt;/span&gt;", the more points you will have.  I'm going to keep it simple and use "&lt;span style="color: rgb(153, 0, 0);"&gt;by=1&lt;/span&gt;", so that we'll have a count of the pitches.&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##create a time sequence for the pitches&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;guthfast$pitch_num &amp;lt;- seq(from=1, to=length(guthfast[,1]), by=1)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;head(data)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;You'll notice in the "&lt;span style="color: rgb(153, 0, 0);"&gt;seq()&lt;/span&gt;" function above, I tell R to count up to the length of the dataset.  By typing "&lt;span style="color: rgb(153, 0, 0);"&gt;length(guthfast[,1])&lt;/span&gt;" I am indicating that I want the number of rows (i.e. the 'length' of the first column in our data set).  This way, if we count by 1, we know that every pitch will have a sequential integer-valued number in the Pitch Number variable we appended onto the data set.&lt;br /&gt;&lt;br /&gt;Now that we have this set up, let's take a quick look at Guthrie's pitch velocity over time.  The code below should plot each pitch's speed as a function of the pitch number:&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##plot all fastball pitch velocity by the pitch number variable&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;plot(guthfast$start_speed ~ guthfast$pitch_num, ylab="Speed out of Hand (Fastballs, MPH)", xlab="Fastball Count (2008-2011)", main="Pitch Speed by Sequential Fastball")&lt;/span&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://3.bp.blogspot.com/-33fktJAnRb8/TcrRK2wrHwI/AAAAAAAAATk/Py67kxL2s-A/s1600/guthriespdRAW.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://3.bp.blogspot.com/-33fktJAnRb8/TcrRK2wrHwI/AAAAAAAAATk/Py67kxL2s-A/s400/guthriespdRAW.png" alt="" id="BLOGGER_PHOTO_ID_5605522670644567810" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Looking at the plot above we can see some semblance of a pattern, but the data seem to be too noisy to really see what is going on.  Perhaps if we use the average for each game we'll be able to see something more useful.  For this, we'll make use of the "tapply()" function again.  If you don't remember this function, head back to the earlier sab-R-metrics posts and check it out.  This function allows us to quickly take the average fastball velocity by game using the following code:&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##get mean fastball velocity by game&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;aggdat &amp;lt;- tapply(guthfast$start_speed, guthfast$gid, mean)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;aggdat &amp;lt;- as.data.frame(aggdat)&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;head(aggdat)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This function spits out a vector with the game id's as the row names.  However, we convert it to a data frame using the function "as.data.frame()" so that we can use our standard object calls and variable names.  Unfortunately, you'll see that we don't have the right variable name.  It's just called "aggdat" for the average velocity for each game.  We can use an easy function in R to fix this up.  But first, let's append a count of the game numbers just like we did with the pitch numbers so that we have them sequentially over time:&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##create game numbers&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;aggdat$game_num &amp;lt;- seq(from=1, to=length(aggdat[,1]), 1)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##change column names&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;colnames(aggdat) &amp;lt;- c("start_speed", "game_num")&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;head(aggdat)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;As you can see, our column names are what they should be now.  Using this new data set, let's again plot the fastball velocity over time.  Note that we reduced the data to only 101 data points (games):&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;###plot average velocity by game across time&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;plot(aggdat$start_speed ~ aggdat$game_num, ylab="Speed out of Hand (Fastballs, MPH)", xlab="Game Number (2008-2011)", main="Pitch Speed by Sequential Game")&lt;/span&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://4.bp.blogspot.com/-qTKNdSxZAWw/TcrRVKCfkjI/AAAAAAAAATs/BYB21jssT7I/s1600/guthriespdMeans.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://4.bp.blogspot.com/-qTKNdSxZAWw/TcrRVKCfkjI/AAAAAAAAATs/BYB21jssT7I/s400/guthriespdMeans.png" alt="" id="BLOGGER_PHOTO_ID_5605522847618273842" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Here, we see that there are many fewer data points than before.  But it's still a bit tough to understand any pattern going on.  We could start with an OLS model to fit the data linearly.  Using the code below, you should get output that tells you Guthrie's fastball velocity is decreasing over time and that this is significant at the 1% level.  But is that really the case?  Once the model is fitted, go ahead and plot the regression line using the second bit of code.  We'll save the standard errors from the model for later on.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##fit a linear model to the data&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;fit.ols &amp;lt;- lm(aggdat$start_speed ~ aggdat$game_num)&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;summary(fit.ols)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;pred.ols &amp;lt;- predict(fit.ols, aggdat, se=TRUE)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##plot the regression line on the data&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;plot(aggdat$start_speed ~  aggdat$game_num, ylab="Speed out of Hand (Fastballs, MPH)", xlab="Game  Number (2008-2011)", main="Pitch Speed by Sequential Game")&lt;br /&gt;&lt;br /&gt;lines(pred.ols$fit, lty="solid", col="darkgreen", lwd=3)&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;a href="http://1.bp.blogspot.com/-n618RlxGdig/TcrR5bQ_raI/AAAAAAAAAT0/2g5v2jtW1AM/s1600/guthrieOLSlineonly.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://1.bp.blogspot.com/-n618RlxGdig/TcrR5bQ_raI/AAAAAAAAAT0/2g5v2jtW1AM/s400/guthrieOLSlineonly.png" alt="" id="BLOGGER_PHOTO_ID_5605523470717791650" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;We see the negative slope in the plot above, but do you think this is the best representation of the data?  Using a loess regression, we may be able to improve on this.  Unfortunately, the drawback from the loess is that there isn't really a clean functional form like we get from OLS.  That means no real 'coefficients' in a nice Y = mX + b form that we learned in algebra class.  For the most part, the best way to use loess is to look at it.&lt;br /&gt;&lt;br /&gt;So, to fit a loess regression, we'll go ahead and stick with the game average data for now.  Using the "&lt;span style="color: rgb(153, 0, 0);"&gt;loess()&lt;/span&gt;" function (doesn't get much easier than that!), we can apply the new analytical tool to our data.  Let's try some basic code first.  Below, I estimate the loess using a default smoothing parameter and then predict values and plot it just like with the OLS model.  Only this time, it looks a bit different.&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##loess default estimation&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;fitd &amp;lt;- loess(aggdat$start_speed ~ aggdat$game_num)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;my.count &amp;lt;- seq(from=1, to=101, by=1)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;predd &amp;lt;- predict(fitd, my.count, se=TRUE)&lt;/span&gt;  &lt;span style="color: rgb(51, 102, 255);"&gt;plot(aggdat$start_speed ~ aggdat$game_num, pch=16, ylab="Speed out of Hand (Fastballs, MPH)", xlab="Pitch Count", main="Pitch Speed by Pitch Count")&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;lines(predd$fit, lty="solid", col="darkred", lwd=3)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://2.bp.blogspot.com/-09_r0o0F8tA/TcrSiaNQkwI/AAAAAAAAAUE/lyMYiXzn_RE/s1600/defaultloess.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://2.bp.blogspot.com/-09_r0o0F8tA/TcrSiaNQkwI/AAAAAAAAAUE/lyMYiXzn_RE/s400/defaultloess.png" alt="" id="BLOGGER_PHOTO_ID_5605524174808322818" border="0" /&gt;&lt;/a&gt;&lt;a href="http://3.bp.blogspot.com/-FcqkOCDnVkM/TcrSTNx649I/AAAAAAAAAT8/WQ1jtsFn0Bg/s1600/defaultloess.png"&gt;&lt;br /&gt;&lt;/a&gt;In the code above, I kept things pretty basic.  Usually, we want to use the argument "span=" in order to tell R how much smoothing we want.  The larger the span, the more points that are included in the weighted estimation, and the smoother the plot will look.  Something to be careful with here, though, is making the span too small.  We don't want to over fit the data, but want it to give us some idea of the pattern that we see.  If we fit every point on its own, we may as well just look at a scatterplot.  On the other hand, if we smooth too much, we may as well just estimate an OLS regression.  The idea is to find a balance between the two using the smoothing parameter.&lt;br /&gt;&lt;br /&gt;You can also identify a polynomial, which allows for more 'wigglyness' in your loess.  For our purposes, I'm not going to bother with this.  However, if you are fitting something that you believe needs some serious wiggles, go ahead and fiddle around with the "&lt;span style="color: rgb(153, 0, 0);"&gt;degree&lt;/span&gt;&lt;span style="color: rgb(153, 0, 0);"&gt;=&lt;/span&gt;" argument in the "&lt;span style="color: rgb(153, 0, 0);"&gt;loess()&lt;/span&gt;" function.  For the most part, I would not recommend going over 3 for the polynomial as you'll likely be bordering on over-fitting--but some data might well need further polynomials.  The default in R for this function is "&lt;span style="color: rgb(153, 0, 0);"&gt;degree=2&lt;/span&gt;", and you can change it to "&lt;span style="color: rgb(153, 0, 0);"&gt;degree=1&lt;/span&gt;" if you like and you'll see your wigglyness--for the same span--decrease a bit.  It all depends on your data.  For the purposes of this post, we'll just 'eyeball it'.  However, there are other ways to optimize smoothing parameters in loess and other smoothing (and density estimation--see next week) methods.  These include "rules of thumb", cross-validation methods, and so on.  The default span in this function in R is 0.75, but it should really depend on your data.&lt;br /&gt;&lt;br /&gt;Now go ahead and add a parameter in your loess code.  For this, just include the additional argument "&lt;span style="color: rgb(153, 0, 0);"&gt;span=&lt;/span&gt;" within the "&lt;span style="color: rgb(153, 0, 0);"&gt;loess()&lt;/span&gt;" function.  Play around with it and see what happens.  I'd say just work with values between 0.1 and 1.0.  The code might look something like this:&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##fiddling with the span&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;fit3 &amp;lt;- loess(aggdat$start_speed ~ aggdat$game_num, span=0.3)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;pred3 &amp;lt;- predict(fit3, my.count, se=TRUE)&lt;/span&gt;  &lt;span style="color: rgb(51, 102, 255);"&gt;plot(aggdat$start_speed ~ aggdat$game_num, ylab="Speed out of Hand (Fastballs, MPH)", xlab="Sequential Game (2008-2011)", main="Pitch Speed by Pitch Count (Span=0.3)")&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;lines(pred3$fit, lty="solid", col="darkred", lwd=3)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://2.bp.blogspot.com/-egIVrFmpULU/TcrS4cIg6uI/AAAAAAAAAUU/zBTlNCDVWIM/s1600/guthriespd3BASIC.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://2.bp.blogspot.com/-egIVrFmpULU/TcrS4cIg6uI/AAAAAAAAAUU/zBTlNCDVWIM/s400/guthriespd3BASIC.png" alt="" id="BLOGGER_PHOTO_ID_5605524553282415330" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Below I've embedded a quick video that shows how the loess line changes when we increase the span by 0.1 each time. I won't provide the code for the movie (it's just a repeat of the same code over and over, and I made the movie in Windows Movie Maker with still images).  In the future, I'll be sure to get into 'for loops' to generate multiple versions of the same plot while incrementally changing a given parameter, but that's advanced for this post.  Watch through the video and try to pick out what you think is the best representation of the data that does not over fit (over-wiggle).&lt;br /&gt;&lt;br /&gt;&lt;object height="300" width="400"&gt;&lt;param name="allowfullscreen" value="true"&gt;&lt;param name="movie" value="http://www.facebook.com/v/556265977513"&gt;&lt;embed src="http://www.facebook.com/v/556265977513" type="application/x-shockwave-flash" allowfullscreen="true" height="300" width="400"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;br /&gt;&lt;br /&gt;You might be saying to yourself, "Well, it looks like the span of 0.6 smooths the best."  If you said that, then I'd agree.  Of course, if you disagree, that doesn't make you wrong.  The default looks pretty good as well (with a span of 0.75).  Remember we want to get to a balance of fit and smooth and we're just eyeballing it.  While it's somewhat subjective in this case, I imagine that we would all come somewhere near a consensus on the range of acceptable smoothing.&lt;br /&gt;&lt;br /&gt;You may also be saying to yourself, "How do I get those cool intervals?"  If that's the case, then you're in luck.  When we used the '&lt;span style="color: rgb(153, 0, 0);"&gt;predict()&lt;/span&gt;' function earlier, I made sure to tell R to keep the standard errors from the loess.  This allows us to plot a standard 95% Interval on our plot.  We'll have to make use of some new functions here.  First, let's create two sequenced data sets for our predictions and interval construction:&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##interval construction stuff&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;my.count &amp;lt;- seq(from=1, to=101, by=1)&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;my.count.rev &amp;lt;- order(my.count, decreasing=TRUE)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The second line simply reverses the order of the first.  So each of these are vectors of the x-variable (game number) in increasing and decreasing order, respectively.  From here, we can go ahead and re-plot our span=0.6 version of the loess and add dashed lines for the confidence intervals using some basic math.  After this, we'll add some fill color and apply what we learned about the RGB color scheme in the &lt;a href="http://princeofslides.blogspot.com/2011/03/sab-r-metrics-sidetrack-bubble-plots.html"&gt;Bubble Plots tutorial.&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##fit the span=0.6 model&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;fit6 &amp;lt;- loess(aggdat$start_speed ~ aggdat$game_num, span=0.6)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;pred6 &amp;lt;- predict(fit6, my.count, se=TRUE)&lt;/span&gt;  &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;##now plot it&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;plot(aggdat$start_speed ~ aggdat$game_num, ylab="Speed out of Hand (Fastballs, MPH)", xlab="Sequential Game (2008-2011)", main="Pitch Speed by Pitch Count (Span=0.6)")&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;lines(pred6$fit, lty="solid", col="darkred", lwd=3)&lt;/span&gt;  &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;##now add the confidence interval lines&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;lines(pred6$fit-1.96*pred6$se.fit, lty="dashed", col="blue", lwd=1)&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;lines(pred6$fit+1.96*pred6$se.fit, lty="dashed", col="blue", lwd=1)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://1.bp.blogspot.com/-5eCqdDBNREY/TcrTFFas4PI/AAAAAAAAAUc/UfYFrVam48g/s1600/guth6spdintlinesonly.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://1.bp.blogspot.com/-5eCqdDBNREY/TcrTFFas4PI/AAAAAAAAAUc/UfYFrVam48g/s400/guth6spdintlinesonly.png" alt="" id="BLOGGER_PHOTO_ID_5605524770522980594" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;You can see here that we use the 1.96 as an approximation of the 95% interval.  In the plot above, we see the interval represented by the blue dashed lines.  However, I really like the filled interval look.  For this, we'll need to use the "&lt;span style="color: rgb(153, 0, 0);"&gt;polygon()&lt;/span&gt;" function and the code below.&lt;br /&gt;&lt;br /&gt;The first portion of the code below tells R that we want to create an outline of a polygon on the y-axis with the confidence bounds at each point along the two vectors we created above.  The second line just recreates the x-axis, but in increasing then decreasing form to get full coverage of the shape.  Finally, we create a shape using the confidence bounds and fill it with a transparent color (otherwise it will cover up everything).  The &lt;span style="color: rgb(153, 0, 0);"&gt;#00009933&lt;/span&gt; indicates that we want it completely blue (99 in digits 5 and 6), with some transparency (33 in digits 7 and 8).  As long as your plot is still up, this code will simply add&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;###create polygon bounds&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;y.polygon.6 &amp;lt;- c((pred6$fit+1.96*pred6$se.fit)[my.count], (pred6$fit-1.96*pred6$se.fit)[my.count.rev])&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;x.polygon &amp;lt;- c(my.count, my.count.rev)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##add this to the plot&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;polygon(x.polygon, y.polygon.6, col="#00009933", border=NA)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://3.bp.blogspot.com/-BxJS_kN26-I/TcrTOdRdxUI/AAAAAAAAAUk/5pV5zy53qUA/s1600/guthriespd6.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://3.bp.blogspot.com/-BxJS_kN26-I/TcrTOdRdxUI/AAAAAAAAAUk/5pV5zy53qUA/s400/guthriespd6.png" alt="" id="BLOGGER_PHOTO_ID_5605524931545515330" border="0" /&gt;&lt;/a&gt;Notice how the interval bands flare out at the end.  This is because there is less data at the endpoints (i.e. beyond the enpoints) of the data.  Therefore, there is less certainty about the prediction here.  This is a common problem in any regression (including linear regression), but are exacerbated in smoothing because of the ability for a single point at the edge of the distribution having too much influence on the direction of the endpoints of the smoothed line.  Just something to be aware of.&lt;br /&gt;&lt;br /&gt;Lastly, if you're feeling lazy, you can always just use the "&lt;span style="color: rgb(153, 0, 0);"&gt;scatter.smooth()&lt;/span&gt;" function.  This will automatically plot your loess, and takes the same arguments as "&lt;span style="color: rgb(153, 0, 0);"&gt;loess()&lt;/span&gt;".  However, here you simultaneously provide the plotting parameters.   See the code and plot below:&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;###show how scatter smooth just does the plot automatically&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;scatter.smooth(aggdat$start_speed ~ aggdat$game_num, degree=2, span=0.6, col="red", main="Scatter Smooth Version (Span=0.6)", xlab="Sequential Game Number", ylab="Starting Speed (Fastballs, MPH)")&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://2.bp.blogspot.com/-alc25IT6cuE/TcrUVLed7DI/AAAAAAAAAUs/U9cpY87vEds/s1600/smoothscatversbbbbbbbbbbbbbbb.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://2.bp.blogspot.com/-alc25IT6cuE/TcrUVLed7DI/AAAAAAAAAUs/U9cpY87vEds/s400/smoothscatversbbbbbbbbbbbbbbb.png" alt="" id="BLOGGER_PHOTO_ID_5605526146538925106" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;The above plot isn't as pleasing to me as the ones I made manually.  In general, by doing things manually you will have more control over the look of things, but if you're looking for something quick, "&lt;span style="color: rgb(153, 0, 0);"&gt;smoothscatter()&lt;/span&gt;" does just fine.  One thing to remember, however, is that it has a different default for the polynomial than the "&lt;span style="color: rgb(153, 0, 0);"&gt;loess()&lt;/span&gt;" function, so if you want the same fit, you'll have to tell R that "&lt;span style="color: rgb(153, 0, 0);"&gt;degree=2&lt;/span&gt;".&lt;br /&gt;&lt;br /&gt;There are other options in loess that I haven't covered today, including the ability to parametrically estimate some variables, while applying the loess function to others.  This can come in handy if you think only some variables are non-linear, while others are linear in nature.  If this interests you, definitely check into it.  Other packages, like mgcv, allow for similar model types using slightly different smoothers and an extension of the smoothing function to Generalized Linear Models (binomial response, etc.).  Hopefully this will give you a nice base to work with loess regression in your own work, but keep in mind that these tutorials are not a replacement for understanding the underlying mathematics that create the pretty pictures.  Loess can be terribly misused in the wrong hands (especially with pitch-location smoothing), so it is important to understand WHY you are doing certain things, not just HOW to do it in R.&lt;br /&gt;&lt;br /&gt;I don't currently have Pretty-R code up and running, as the Blogger HTML really effs with embedding it in here.  All of the necessary code is included above (and remember if you want to save plots and pictures, use a graphics device like "&lt;span style="color: rgb(153, 0, 0);"&gt;png()&lt;/span&gt;").&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-3869237948501766989?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/3869237948501766989/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/05/sab-r-metrics-basics-of-loess.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/3869237948501766989'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/3869237948501766989'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/05/sab-r-metrics-basics-of-loess.html' title='sab-R-metrics: Basics of LOESS Regression'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-33fktJAnRb8/TcrRK2wrHwI/AAAAAAAAATk/Py67kxL2s-A/s72-c/guthriespdRAW.png' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-5063322357112102590</id><published>2011-05-05T13:14:00.002-04:00</published><updated>2011-05-13T13:19:29.930-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='R-project'/><category scheme='http://www.blogger.com/atom/ns#' term='Statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='sab-R-metrics'/><title type='text'>sab-R-metrics: Logistic Regression</title><content type='html'>It's been a while since my last sab-R-metrics post, and I have not gotten to the real fun stuff yet.  I apologize for the long layoff, and it's likely that these will be sparse for the next couple weeks.  I have had some consulting opportunities come up, I've got 6 (possibly 7) presentations or co-authored presentations coming up this summer, I had to finish up a dissertation proposal and final exam, and apparently I have to start thinking about getting on the job market.  Yeesh!&lt;br /&gt;&lt;br /&gt;Today, I'll continue with logistic regression, but first be sure to check out the last few posts on &lt;a href="http://princeofslides.blogspot.com/2011/02/sab-r-metrics-basic-applied-regression.html"&gt;Simple OLS&lt;/a&gt;, &lt;a href="http://princeofslides.blogspot.com/2011/03/sab-r-metrics-multiple-regression-and.html"&gt;Multiple Regression&lt;/a&gt;, and &lt;a href="http://princeofslides.blogspot.com/2011/03/sab-r-metrics-brief-sidetrack-for.html"&gt;Scatterplot Matrices&lt;/a&gt;.  We'll be using some of the lessons from these posts for today's tutorial.&lt;br /&gt;&lt;br /&gt;Logistic regression is a generalization of linear regression that allows us to model probabilities of binomial events.  Normally in OLS we want to have a dependent variable that is continuous and--optimally--normally distributed.  Without getting too in-depth with continuous vs. discrete measures (&lt;span style="font-style: italic;"&gt;technically &lt;/span&gt;anything we &lt;span style="font-style: italic;"&gt;measure &lt;/span&gt;is discrete once we measure it and round it even to the umptillionth decimal), it's pretty easy to tell that a dependent variable with 0 representing a "No" and a 1 representing a "Yes" response is not continuous or normally distributed.  In this sort of application, we are often interested in probability and how certain variables affect that probability of the Yes or No answer.  Since probability must be between 0 and 1, we need to bound our data at these limits.  Otherwise you'll get probability predictions above 1 and below 0.&lt;br /&gt;&lt;br /&gt;For the most part, using OLS as a quick and dirty way to estimate a binomial variable is reasonable.  And for observations that are not near 0 or 1, it can work just fine.  But with the available computer power that comes free in R, there is little reason not to fit a Generalized Linear Model to the binomial response (unless of course you have a very large data set).  So today, we'll again use the data "hallhitters2.csv" which you can &lt;a href="http://sitemaker.umich.edu/millsbrian/files/hallhitters2.csv"&gt;download here&lt;/a&gt;&lt;a href="http://sitemaker.umich.edu/millsbrian/files/hallhitters2.csv"&gt;.&lt;/a&gt;  What is more fun than trying to predict Hall of Fame induction?  We'll keep things simple today, but it is important to note that this isn't any sort of new idea.  &lt;a href="http://cybermetric.blogspot.com/"&gt;Cyril Morong&lt;/a&gt;, among others, have used logistic regression for Hall of Fame prediction in the past.&lt;br /&gt;&lt;br /&gt;For this exercise, I am going to assume that BBWAA writers only pay attention to traditional statistics.  That means we'll only include Home Runs, Runs, Runs Batted In, Hits, Batting Average and Stolen Bases in our regression.  Obviously this will be limited, and doesn't account for fielding, 'integrity' or whatever else the BBWAA claims to take into account.  But it should be plenty good enough to get the point across.&lt;br /&gt;&lt;br /&gt;Let's jump right in.  Go ahead and load up the data.  I'm going to name mine '&lt;span style="color: rgb(153, 0, 0);"&gt;hall&lt;/span&gt;':&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##set working directory and load data&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;setwd("c:/Users/Millsy/Dropbox/Blog Stuff/sab-R-metrics")&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;hall &amp;lt;- read.csv(file="hallhitters2.csv", h=T)&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;head(hall)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Alright, let's first run a standard OLS model on the data, followed by the logistic regression to compare.  I have included a variable called "BBWAA" in the data so that we can use this as our response variable.  If a player is voted into the Hall by the BBWAA, they are given a '1', otherwise the variable will be '0'.  For logistic regression, we'll use a new R function called "&lt;span style="color: rgb(153, 0, 0);"&gt;glm&lt;/span&gt;", which of course stands for Generalized Linear Models.  This function can be used for a number of linear model generalizations and link functions (such as Poisson regression or using the Probit link function for binomial data).  We'll stick with the binomial version and logistic regression here.&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##start with an OLS version and predict hall of fame probability&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;fit.ols &amp;lt;- lm(BBWAA ~ H + HR + RBI + SB + R + BA, data=hall)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;summary(fit.ols)&lt;/span&gt;   &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;##fit logistic regression&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;fit.logit &amp;lt;- glm(BBWAA ~ H + HR + RBI + SB + R + BA, data=hall, family=binomial(link="logit"))&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;summary(fit.logit)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Notice the slight difference in coding, as we have to tell R that the GLM we want to use is for binomial data with a 'logit link function'.  We'll revisit the OLS model later on, but for now let's focus on the logistic regression coefficients.  These are a bit different from the coefficients of our linear models from before.  Just reading the coefficients off the table as the increase in probability of induction won't work.  The coefficients must be transformed.  The logistic regression uses a transformation of the predictors, 'x', in order to model the probability on a 0 to 1 scale.  The transformation used here is:&lt;br /&gt;&lt;br /&gt;InverseLogit(x) = e^x/(1+e^x)&lt;br /&gt;&lt;br /&gt;or&lt;br /&gt;&lt;br /&gt;Logit(y) = BXi&lt;br /&gt;&lt;br /&gt;Here, B are the coefficients, while Xi  represents our predictor variables.  So when we look at the coefficients, we have to do something with them in order to make sense out of what we're seeing in the regression output.  To get the effect, we can use the following transformation, including the intercept from the regression.  Let's check out the association of Home Runs and the probability of induction into the Hall of Fame in as simple a way as possible:&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##exponentiate HR coefficient to get probability of induction increase per HR&lt;/span&gt;  &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;exp(0.002153)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Which comes out to about 0.215% increase in induction odds per home run.  So, for a coefficient like this, we pretty much can read it directly off the table for those players around the average home run total.  However, it is important to understand this isn't a coefficient like in OLS.  If we simply multiply this change by Hank Aaron's 755 Home Runs, we get a probability of induction for him at about 162%.  Remember that in a model like this, the influence of the HR total decreases as the probability of the player reaching the Hall gets closer and closer to 1.  Otherwise, we'd end up with probabilities above 1 and below 0.  For a good primer on interpreting coefficients from a logistic regression model, &lt;a href="http://www.ats.ucla.edu/stat/mult_pkg/faq/general/odds_ratio.htm"&gt;check out this site through the UCLA stats department&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Now, it's interesting to look at coefficients, but because of the collinearity issues, they may not be all that useful for interpretation.  Multicollinearity is the likely culprit for the coefficients on Hits being negative (&lt;a href="http://3.bp.blogspot.com/-IRtmcquZcV0/TX5CtSpZW4I/AAAAAAAAAOw/bbzpnQ1kosk/s1600/spm2.png"&gt;and recall our scatter plot matrix&lt;/a&gt;).  It would probably be optimal to create some sort of orthogonal set of predictors with Principal Components Analysis (PCA) or something of the sort (which I plan to get to in a later sab-R-metrics post).  But for now, we'll just use the logistic regression for prediction.&lt;br /&gt;&lt;br /&gt;However, another valuable part of logistic regression is its prediction of a probability of a 'success' (here, induction into the Hall of Fame by the BBWAA).  Because the coefficients are in log-odds form, we need to use "response" when predicting to get actual probabilities (0 to 1).  Go ahead and predict the probability of induction using both the OLS and the Logistic model and attach this to the end of your data set with the following code:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##predict values and append to data&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;hall$ols.pred &amp;lt;- predict(fit.ols, hall, type="response")&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##predict the probability of induction using the logit model&lt;/span&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;hall$log.pred &amp;lt;- predict(fit.logit, hall, type="response")&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;head(hall)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;From here, it can be easier to look at the output if we create a new table in Excel using this data.  Of course, you could copy and paste into Excel if you want, but there's a nice easy option in R to create a table.  Keeping with the CSV format I like to work with, we can create a new table in our working directory from the newly updated data set with Hall of Fame probabilities with the following code:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##create table for looking at in Excel&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;write.table(hall, file="hallhittersPRED.csv", row.names=F, sep=",")&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;This rather convenient function comes in handy for me very often.  In the function above, you tell R that you want to write a new file using the object "&lt;span style="color: rgb(153, 0, 0);"&gt;hall&lt;/span&gt;".  Then you name the file.  Here, be sure to put the .csv extension on the end.  You could also do .txt, and there are of course other options.  I use the '&lt;span style="color: rgb(153, 0, 0);"&gt;row.names=F&lt;/span&gt;' to tell R that I don't need the row names in the new file, but there are cases when this makes sense to allow (i.e. when you are writing a table from the output of the '&lt;span style="color: rgb(153, 0, 0);"&gt;tapply()&lt;/span&gt;' function).  Finally, we tell R what to use as a separator with "&lt;span style="color: rgb(153, 0, 0);"&gt;sep=&lt;/span&gt;".  Be sure to put this in quotes.  You can use anything you want, but the safest bet is to use commas (assuming your vectors don't have commas in them, but they should have double quotes protecting the fields anyway).  I usually use write.table so I have full control over the file; however, you can also use the simple code below and get the same result:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;###using write.csv now&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;write.csv(hall, file="hallhittersPRED.csv", row.names=F)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Go to your working directory and open up the CSV file in Excel.  Now, inspect the predicted probabilities of the logit model first.  They'll be nice and easy to sort in Excel.&lt;br /&gt;&lt;br /&gt;The logit probabilities seem perfectly reasonable, with the likes of Rickey Henderson, Cal Ripken and Tony Gwynn about as near sure candidates as you can get.  On the other hand, we see that Kirby Puckett doesn't get as high of a probability.  We know why though: Puckett's career was cut short.  With our knowledge of baseball at hand, these probabilities mostly seem to make sense.&lt;br /&gt;&lt;br /&gt;However, looking at the OLS model, there are some strange things going on.  The model doesn't particularly like Ripken or Gwynn.  If you go ahead and sort your data by the predicted OLS probabilities, you'll see that there are a number of negative induction probabilities for guys like Greg Pryor and Paul Casanova.  We can just round these to zero, but optimally the logit model gives us a better look at what is going on in the data.&lt;br /&gt;&lt;br /&gt;As before, it's sometimes useful to look at a plot of our data to see how our predictions fit the true outcomes.  However, since we will be comparing 0-1 binomial data to a continuous probability prediction, we'll have to employ some new things in R for this visualization.  Luckily, I recently found a very handy function in the "&lt;span style="color: rgb(153, 0, 0);"&gt;popbio&lt;/span&gt;" library for this.  The first thing you'll need to do is install this package (I went over how to do this before).  Next, use the following code to create a plot with the "&lt;span style="color: rgb(153, 0, 0);"&gt;logi.hist.plot&lt;/span&gt;" function:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##create plot of logit predictions&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;logi.hist.plot(hall$log.pred, hall$BBWAA, boxp=F, type="hist", col="gray", rug=T,&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;    xlab="Hall of Fame Status", mainlabel="Plot of Logistic Regression for BBWAA Inductions")&lt;/span&gt;  &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;##add the names of mis-classified HOFers&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;low.prob &amp;lt;- subset(hall, hall$log.pred &amp;lt; .20 &amp;amp; hall$BBWAA==1)&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;text(low.prob$BBWAA, low.prob$log.pred, low.prob$last, col="darkred", cex=.8)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;As you can see, the second portion of the code adds the names of the Hall of Famers who were predicted to have a very low probability of induction.  You can see that the probability of making the hall of fame increases as we get closer to the "1" classification.  In the middle, the probability of induction changes at the highest rate, while it tails off at each end in order to bound it between 0 and 1.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://4.bp.blogspot.com/-OVqS4z5OXNQ/TcLTJ72oKVI/AAAAAAAAATA/Fhs0XG6KMOQ/s1600/logitplot.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 267px;" src="http://4.bp.blogspot.com/-OVqS4z5OXNQ/TcLTJ72oKVI/AAAAAAAAATA/Fhs0XG6KMOQ/s400/logitplot.png" alt="" id="BLOGGER_PHOTO_ID_5603273054041418066" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;We can inspect these and for the most part understand why our model misses these guys.  Jackie Robinson and Roy Campanella had shorter careers due to segregation and catastrophic injury, respectively.  We didn't include fielding data in our model, so it's no surprise that Ozzie Smith was predicted so low.  Guys like Kiner, Bordreau, Carter and Aparicio are less clear.  But perhaps these were just inconsistencies in the voting by BBWAA members.  Many would argue that these players don't really belong in the HOF anyway.&lt;br /&gt;&lt;br /&gt;Now that we have this nice looking picture, how do we really evaluate the usefulness of our model?  The most common way to evaluate something predicting binomial data is an ROC curve (and it's AUC-"Area Under Curve").  This benchmarks the performance of your model in predicting the correct class (i.e. 0 or 1).  If you perfectly predict, the area under the curve will be 1.  If you randomly chose 0 or 1, then you're looking at 0.5 (i.e. a 45 degree line in the plot seen below).  Usually, we'll see some sort of curve between 0.5 and 1, but the goal is to get as close to 1 as possible (without over-fitting, of course).&lt;br /&gt;&lt;br /&gt;For this, go ahead and install the package 'ROCR' onto your computer.  We'll make use of this.  Using the code below, you can create the ROC curve and also the AUC&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##############ROC Curve&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;library(ROCR)&lt;/span&gt;   &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;preds &amp;lt;- prediction(as.numeric(hall$log.pred), as.numeric(hall$BBWAA))&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;perf &amp;lt;- performance(preds, "tpr", "fpr")&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;perf2 &amp;lt;- performance(preds, "auc")&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;perf2&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;plot(perf, main="ROC Curve for Hall of Fame Predictions", lwd=2, col="darkgreen")&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;text(0.5, 0.5, "AUC = 0.9855472", cex=2.2, col="darkred")&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The first portion of this code sets up the predictions and the true classes.  The second line creates performance vectors using the "&lt;span style="color: rgb(153, 0, 0);"&gt;tpr&lt;/span&gt;" and "&lt;span style="color: rgb(153, 0, 0);"&gt;fpr&lt;/span&gt;" commands, which refer to "True Positive Rate" and "False Positive Rate", respectively.  Then we plot them against one another and get the following:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://2.bp.blogspot.com/-eYl_6ymqi-k/TcLXU0CayEI/AAAAAAAAATI/0TAdLsYYarg/s1600/aucplot.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 400px;" src="http://2.bp.blogspot.com/-eYl_6ymqi-k/TcLXU0CayEI/AAAAAAAAATI/0TAdLsYYarg/s400/aucplot.png" alt="" id="BLOGGER_PHOTO_ID_5603277638968461378" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Finally, the "&lt;span style="color: rgb(153, 0, 0);"&gt;perf2&lt;/span&gt;" object calculates the AUC for our model, which we find is a solid 0.986 or so.  Of course, this is enhanced by the fact that there are so many easy decisions on non-Hall of Famers.  If we included only "borderline candidates" for our AUC calculation, we would get a much lower number.  Whether or not you believe it is appropriate to report such a high AUC produced by this fact is another question, but keep in mind that the 'accuracy' may be a bit misleading in this case.  We don't need a predictive model to tell us that Joe Schmoe isn't going to be inducted into the Hall of Fame.&lt;br /&gt;&lt;br /&gt;Finally, if we've decided that this model is accurate enough, we can apply this model to current players.  For this, we'll use the &lt;a href="http://sitemaker.umich.edu/millsbrian/files/currenthitters.csv"&gt;file "currenthitters.csv" from my website here&lt;/a&gt;&lt;a href="http://sitemaker.umich.edu/millsbrian/files/currenthitters.csv"&gt;.&lt;/a&gt;  For this, we want to be sure we have the same variable names for the same variables so that the new code knows what to call from the original logistic regression model.  Here's the code below to load in the data, create the predictions, attach them to the data, and write a new table with the added predicted induction probabilities.&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;##get current hitter data and check variable names&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;current &amp;lt;- read.csv(file="currenthitters.csv", h=T)&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;head(current)&lt;/span&gt;   &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;##predict probabilities&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;current$hall_prob &amp;lt;- predict(fit.logit, current, type="response")&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;head(current)&lt;/span&gt;   &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;##write a new file with the induction probabilities&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;&lt;br /&gt;&lt;br /&gt;write.table(current, file="currentPRED.csv", row.names=F, sep=",")&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Then go ahead and open up the file for easier inspection (or just snoop around in R if you want).&lt;br /&gt;If you want to order them by induction probability in R directly, then use the following code:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;###order by most likely to get into hall of fame&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;current &amp;lt;- current[order(-current$hall_prob),]&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;current[1:35,]&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;You can use the ordering code for any variable you want, and I use the "-" sign within the parentheses in order to tell R that I want things in descending order.  The default is ascending order.  The second line of code just calls the first 35 rows of the data.&lt;br /&gt;&lt;br /&gt;You can see that Alex Rodriguez is the most likely in the sample of active players, with Ken Griffey, Jr. just behind him.  Most of these make sense.  Some are likely not right, like Luis Gonzalez and Bobby Abreu.  Others, we know will have problems getting into the Hall based on steroid accusations, which we did not include in our model.&lt;br /&gt;&lt;br /&gt;Some players are surprises here, but this is likely because they just don't have enough at bats yet to hit the milestones for the Hall of Fame.  Keep in mind that the data are only through 2009 that I've provided, so Ichiro has less than 2,000 hits.  Perhaps if we modeled each HOF players' career at each given point in time, we could use this to predict guys that haven't been around very long.  But that's quite a long and complicated process, and the current model is as far as I'll take today's lesson.  Finally, keep in mind that we are predicting induction probabilities based on past BBWAA inductions, NOT whether or not the player deserves to be in the Hall of Fame.  This is a discussion left for another day.&lt;br /&gt;&lt;br /&gt;So there you have it: predicting Hall of Fame induction and logistic regression in R.  Remember that the '&lt;span style="color: rgb(153, 0, 0);"&gt;glm()&lt;/span&gt;' function can be extended to other models like Probit and Poisson.  Below is the code from Pretty R as usual.&lt;br /&gt;&lt;br /&gt;CODE: (HTML IS BEING FUNKY, I WILL TRY TO FIX THIS ISSUE SOON!)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-5063322357112102590?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/5063322357112102590/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/04/sab-r-metrics-logistic-regression.html#comment-form' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/5063322357112102590'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/5063322357112102590'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/04/sab-r-metrics-logistic-regression.html' title='sab-R-metrics: Logistic Regression'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-OVqS4z5OXNQ/TcLTJ72oKVI/AAAAAAAAATA/Fhs0XG6KMOQ/s72-c/logitplot.png' height='72' width='72'/><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-4097425364963501125</id><published>2011-05-05T09:39:00.003-04:00</published><updated>2011-05-05T09:43:21.379-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='Bill James'/><category scheme='http://www.blogger.com/atom/ns#' term='Links'/><title type='text'>Andrew Gelman for Baseball ProGUESTus</title><content type='html'>Today, the &lt;a href="http://www.baseballprospectus.com/article.php?articleid=13810"&gt;Baseball ProGUESTus post up is by Andrew Gelman &lt;/a&gt;thinking back on the lessons in statistics from Bill James.  An interesting read.  B-Pro is getting people that I hadn't expected to participate.  Each week, it looks more and more like they got me for the bottom of the barrel when they didn't have someone famous to write.&lt;br /&gt;&lt;br /&gt;If you don't know this already, Gelman's blog is on my sidebar.  I enjoy reading his blog and he occasionally comments on sports, but it's mostly Bayesian statistics and political stuff.  Every time I am there I learn something.  I've cited his book (with Jennifer Hill) here a few times during sab-R-metrics posts and will do so in the next few as well.  Definitely check out his BP piece.&lt;br /&gt;&lt;br /&gt;Also, I'll have a new sab-R-metrics post up later today or tomorrow.  Finally!  Sorry for the delay.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-4097425364963501125?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/4097425364963501125/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/05/andrew-gelman-for-baseball-proguestus.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/4097425364963501125'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/4097425364963501125'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/05/andrew-gelman-for-baseball-proguestus.html' title='Andrew Gelman for Baseball ProGUESTus'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-1453470859437814848</id><published>2011-04-26T17:48:00.008-04:00</published><updated>2011-04-26T18:25:44.369-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Sports'/><category scheme='http://www.blogger.com/atom/ns#' term='Wine Tasting'/><category scheme='http://www.blogger.com/atom/ns#' term='Vacation'/><title type='text'>A Must Visit for Baseball Fans in Napa</title><content type='html'>Back in the first week of March, my fiancee and I headed off to Napa Valley for a week of wine drinking and nice weather.  I'm more of a beer guy myself, but find it interesting to taste new things.  Give me something different and I'll be happy.  In the end, most wines taste pretty much the same to an unsophisticated palette like mine.  But I can sure tell the difference in an IPA.&lt;br /&gt;&lt;br /&gt;Anyway, we took a really fun "Wine Tour" through &lt;a href="http://www.beauwinetours.com/"&gt;Beau Wine Tours&lt;/a&gt;.  I would highly recommend this, and at only $100 a person for a full day of boozing, it's definitely worth the money (note: there are wine tasting fees at some places...they are reasonable).  If you are in the area and choose to do this, ask for Damon.  He was great.  If you don't have a big group, that's okay, as they'll just pair you with other random people in the limo.  This usually makes for good fun, and we ended up riding around with other people from Michigan!&lt;br /&gt;&lt;br /&gt;The wine tour takes you to four different wineries chosen by your host, the limo driver.  You get a very scrumptious picnic lunch from a local sandwich shop and they'll even pick you up and drop you off at your hotel.   It was really the highlight of heading off to Napa, but going around on your own is fun, too.  So why am I talking about wine tasting on a sports blog?&lt;br /&gt;&lt;br /&gt;Well, the reason I say ask for Damon at the Wine Tours is this: he is friends with the family that owns &lt;a href="http://www.hillfamilyestate.com/"&gt;Hill Family Estates&lt;/a&gt;.  It is a small place in Yountville and--if you're the non-wine snob male party like me--you'll love it.  BUT, if you're a wine snob, you'll also still really enjoy it.  It's got a bit of everything, and they cater to this very well.  &lt;a href="http://www.hillfamilyestate.com/index.cfm?method=pages.showPage&amp;amp;pageid=8193e52b-d578-ebaf-a0c4-b8bd2bea6986"&gt;The owner is a younger guy (well the family owns it, but he seems to run a lot of the marketing)&lt;/a&gt; and is into sports, music, surfing, and the like.  But here's the kicker: they sit you down for a free tasting (if you come with Damon), fresh cheese and Italian meats in a room covered from wall to wall with sports memorabilia.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://3.bp.blogspot.com/-_Xe1EWG1lko/TbdDRw3SUuI/AAAAAAAAASY/vFWSfn_S8xw/s1600/205548_550508230409_47800491_31764079_7856295_n.jpg"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 300px;" src="http://3.bp.blogspot.com/-_Xe1EWG1lko/TbdDRw3SUuI/AAAAAAAAASY/vFWSfn_S8xw/s400/205548_550508230409_47800491_31764079_7856295_n.jpg" alt="" id="BLOGGER_PHOTO_ID_5600018634112127714" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a href="http://www.hillfamilyestate.com/stainedprojects"&gt;Every year, Ryan and the Estate does a 'wine staining project'.  &lt;/a&gt;Here, they take all sorts of memorabilia like surf boards, guitars, baseball bats, etc. and pair it with a wine that is 'inspired' by an athlete or someone of the sort.  They do a lot of this with baseball players, including Rick Ankiel, Luis Gonzalez, Bronson Arroyo, Tom Glavine and Greg Maddux.  &lt;a href="http://1.bp.blogspot.com/-yj0-RKFpE00/TbdDS_B055I/AAAAAAAAASw/wvy2HSUUvoc/s1600/215743_550508280309_47800491_31764082_2394057_n.jpg"&gt;&lt;br /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a href="http://1.bp.blogspot.com/-g4kRKQsN150/TbdDSBbrS6I/AAAAAAAAASg/5LP2yBOk1zI/s1600/206338_550508205459_47800491_31764077_7457755_n.jpg"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 300px;" src="http://1.bp.blogspot.com/-g4kRKQsN150/TbdDSBbrS6I/AAAAAAAAASg/5LP2yBOk1zI/s400/206338_550508205459_47800491_31764077_7457755_n.jpg" alt="" id="BLOGGER_PHOTO_ID_5600018638559726498" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;As a kid, I was never the overpowering pitcher, so I absolutely loved Maddux.  This was the first place we went on the tour, and I couldn't help myself.  I splurged, and bought only one bottle of wine on the trip (for $75 I might add): Greg Maddux Wine.   It's not autographed, but I had to have it after being there.  This is it:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://1.bp.blogspot.com/-TvqXLV8c-jA/TbdAUnCFZdI/AAAAAAAAASI/K75wS4UjDXs/s1600/0303111145a.jpg"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 320px;" src="http://1.bp.blogspot.com/-TvqXLV8c-jA/TbdAUnCFZdI/AAAAAAAAASI/K75wS4UjDXs/s400/0303111145a.jpg" alt="" id="BLOGGER_PHOTO_ID_5600015384477787602" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;There are some Bronson Arroyo and Luis Gonzalez bottles left with signatures (unfortunately, $275 for a magnum Arroyo autographed bottle was a bit steep for a grad student).  But the standard bottles with autographs are the same price as the non-signed ones.  There were no Maddux signatures (DAMN!), but I still found this to be pretty cool.  The autographed wine-stained bats are neat, too.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://3.bp.blogspot.com/-Yd7qLjE-dEw/TbdDSuYbIeI/AAAAAAAAASo/hrTA-6gealo/s1600/215719_550508305259_47800491_31764084_4554620_n.jpg"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 300px;" src="http://3.bp.blogspot.com/-Yd7qLjE-dEw/TbdDSuYbIeI/AAAAAAAAASo/hrTA-6gealo/s400/215719_550508305259_47800491_31764084_4554620_n.jpg" alt="" id="BLOGGER_PHOTO_ID_5600018650625679842" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a href="http://www.uncorkforacause.com/articles/show/49-Johnny_Damon_Charity_Wine_Event_to_Benefit_Families_of_Fallen_St_Petersburg_Officers"&gt;It looks like the next wine (a 2007 red wine) will be coming courtesy of Johnny Damon.&lt;/a&gt;  I believe they try and do something with a World Series winner each year.  &lt;a href="http://www.uncorkforacause.com/products/43"&gt;You can get an autographed bottle here.&lt;/a&gt;  And they do all sorts of cool packaged things like this Arroyo boxed package if you just can't get enough Arroyo:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://1.bp.blogspot.com/-yj0-RKFpE00/TbdDS_B055I/AAAAAAAAASw/wvy2HSUUvoc/s1600/215743_550508280309_47800491_31764082_2394057_n.jpg"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 300px;" src="http://1.bp.blogspot.com/-yj0-RKFpE00/TbdDS_B055I/AAAAAAAAASw/wvy2HSUUvoc/s400/215743_550508280309_47800491_31764082_2394057_n.jpg" alt="" id="BLOGGER_PHOTO_ID_5600018655094302610" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;All in all, these are fun things to have on the shelf.  Ultimately, they're just unique, they scream 'baseball nut', and yet have a sophisticated side.  But the real fun comes from going to Hill Estates in Yountville and sitting there doing the tasting, talking to the people there, and checking out all the cool wine-stained stuff.  You can get the bottles there, and have a really fun experience.  Thanks to them and Beau Wine Tours for showing us a great time.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-Byp8eFwTT1Y/TbdCwMh9XnI/AAAAAAAAASQ/OeQvOGWQxVU/s1600/216056_550508330209_47800491_31764085_4496652_n.jpg"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 300px;" src="http://3.bp.blogspot.com/-Byp8eFwTT1Y/TbdCwMh9XnI/AAAAAAAAASQ/OeQvOGWQxVU/s400/216056_550508330209_47800491_31764085_4496652_n.jpg" alt="" id="BLOGGER_PHOTO_ID_5600018057423314546" border="0" /&gt;&lt;/a&gt;&lt;span style="font-weight: bold;"&gt;(GUESS WHO'S HOLDING THE BASEBALL BAT!)&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;a href="http://1.bp.blogspot.com/-g4kRKQsN150/TbdDSBbrS6I/AAAAAAAAASg/5LP2yBOk1zI/s1600/206338_550508205459_47800491_31764077_7457755_n.jpg"&gt;&lt;br /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-1453470859437814848?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/1453470859437814848/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/04/must-visit-for-baseball-fans-in-napa.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/1453470859437814848'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/1453470859437814848'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/04/must-visit-for-baseball-fans-in-napa.html' title='A Must Visit for Baseball Fans in Napa'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-_Xe1EWG1lko/TbdDRw3SUuI/AAAAAAAAASY/vFWSfn_S8xw/s72-c/205548_550508230409_47800491_31764079_7856295_n.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-6924292601718917808</id><published>2011-04-16T11:17:00.005-04:00</published><updated>2011-04-16T11:23:20.896-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Programming'/><category scheme='http://www.blogger.com/atom/ns#' term='R-project'/><category scheme='http://www.blogger.com/atom/ns#' term='Statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='Baseball'/><title type='text'>Trackman Position Needs R Knowledge</title><content type='html'>Thought some of the R-Blogger readers would be interested in the position linked below.  If you're a baseball fan and like working in R, this is a fun company that seems to be getting more and more press.  Recently, it was featured in Sports Illustrated and has been covered on ESPN as well.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.workinsports.com/wisquickregapply.asp?referrer=793&amp;amp;idx=64599"&gt;http://www.workinsports.com/wisquickregapply.asp?referrer=793&amp;amp;idx=64599&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I've interfaced a bit with the people at Trackman in the past and they really are excited about the stuff they do.  I can't say much more about the data/position as I have signed an NDA with the company.  However, I can say the position is recommended for those with R knowledge specifically (which is also indicated in the job description).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-6924292601718917808?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/6924292601718917808/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/04/trackman-position-needs-r-knowledge.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/6924292601718917808'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/6924292601718917808'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/04/trackman-position-needs-r-knowledge.html' title='Trackman Position Needs R Knowledge'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-4957937929367482309</id><published>2011-04-08T14:43:00.014-04:00</published><updated>2011-04-08T15:35:10.469-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Strike Zones'/><category scheme='http://www.blogger.com/atom/ns#' term='Statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='Umpires'/><category scheme='http://www.blogger.com/atom/ns#' term='Pitch F/X'/><title type='text'>Umpire Bias Favoring Catchers At Bat?</title><content type='html'>Recently, I've been working a lot with umpire data.  A lot of this has to do with the nice big sample sizes that it provides for most umpires, which just makes it easier to infer interesting things fr&lt;span style="font-family:georgia;font-size:100%;"&gt;om the data.  Today I thought I would check out something I came across when looking for B-Pro topics over the past couple weeks.&lt;br /&gt;&lt;/span&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;o:officedocumentsettings&gt;   &lt;o:allowpng/&gt;  &lt;/o:OfficeDocumentSettings&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:worddocument&gt;   &lt;w:view&gt;Normal&lt;/w:View&gt;   &lt;w:zoom&gt;0&lt;/w:Zoom&gt;   &lt;w:trackmoves/&gt;   &lt;w:trackformatting/&gt;   &lt;w:punctuationkerning/&gt;   &lt;w:validateagainstschemas/&gt;   &lt;w:saveifxmlinvalid&gt;false&lt;/w:SaveIfXMLInvalid&gt;   &lt;w:ignoremixedcontent&gt;false&lt;/w:IgnoreMixedContent&gt;   &lt;w:alwaysshowplaceholdertext&gt;false&lt;/w:AlwaysShowPlaceholderText&gt;   &lt;w:donotpromoteqf/&gt;   &lt;w:lidthemeother&gt;EN-US&lt;/w:LidThemeOther&gt;   &lt;w:lidthemeasian&gt;X-NONE&lt;/w:LidThemeAsian&gt;   &lt;w:lidthemecomplexscript&gt;X-NONE&lt;/w:LidThemeComplexScript&gt;   &lt;w:compatibility&gt;    &lt;w:breakwrappedtables/&gt;    &lt;w:snaptogridincell/&gt;    &lt;w:wraptextwithpunct/&gt;    &lt;w:useasianbreakrules/&gt;    &lt;w:dontgrowautofit/&gt;    &lt;w:splitpgbreakandparamark/&gt;    &lt;w:enableopentypekerning/&gt;    &lt;w:dontflipmirrorindents/&gt;    &lt;w:overridetablestylehps/&gt;   &lt;/w:Compatibility&gt;   &lt;m:mathpr&gt;    &lt;m:mathfont val="Cambria Math"&gt;    &lt;m:brkbin val="before"&gt;    &lt;m:brkbinsub val="&amp;#45;-"&gt;    &lt;m:smallfrac val="off"&gt;    &lt;m:dispdef/&gt;    &lt;m:lmargin val="0"&gt;    &lt;m:rmargin val="0"&gt;    &lt;m:defjc val="centerGroup"&gt;    &lt;m:wrapindent val="1440"&gt;    &lt;m:intlim val="subSup"&gt;    &lt;m:narylim val="undOvr"&gt;   &lt;/m:mathPr&gt;&lt;/w:WordDocument&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:latentstyles deflockedstate="false" defunhidewhenused="true" defsemihidden="true" defqformat="false" defpriority="99" latentstylecount="267"&gt;   &lt;w:lsdexception locked="false" priority="0" semihidden="false" unhidewhenused="false" qformat="true" name="Normal"&gt;   &lt;w:lsdexception locked="false" priority="9" semihidden="false" unhidewhenused="false" qformat="true" name="heading 1"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 2"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 3"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 4"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 5"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 6"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 7"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 8"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 9"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 1"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 2"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 3"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 4"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 5"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 6"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 7"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 8"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 9"&gt;   &lt;w:lsdexception locked="false" priority="35" qformat="true" name="caption"&gt;   &lt;w:lsdexception locked="false" priority="10" semihidden="false" unhidewhenused="false" qformat="true" name="Title"&gt;   &lt;w:lsdexception locked="false" priority="1" name="Default Paragraph Font"&gt;   &lt;w:lsdexception locked="false" priority="11" semihidden="false" unhidewhenused="false" qformat="true" name="Subtitle"&gt;   &lt;w:lsdexception locked="false" priority="22" semihidden="false" unhidewhenused="false" qformat="true" name="Strong"&gt;   &lt;w:lsdexception locked="false" priority="20" semihidden="false" unhidewhenused="false" qformat="true" name="Emphasis"&gt;   &lt;w:lsdexception locked="false" priority="59" semihidden="false" unhidewhenused="false" name="Table Grid"&gt;   &lt;w:lsdexception locked="false" unhidewhenused="false" name="Placeholder Text"&gt;   &lt;w:lsdexception locked="false" priority="1" semihidden="false" unhidewhenused="false" qformat="true" name="No Spacing"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1 Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2 Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1 Accent 1"&gt;   &lt;w:lsdexception locked="false" unhidewhenused="false" name="Revision"&gt;   &lt;w:lsdexception locked="false" priority="34" semihidden="false" unhidewhenused="false" qformat="true" name="List Paragraph"&gt;   &lt;w:lsdexception locked="false" priority="29" semihidden="false" unhidewhenused="false" qformat="true" name="Quote"&gt;   &lt;w:lsdexception locked="false" priority="30" semihidden="false" unhidewhenused="false" qformat="true" name="Intense Quote"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2 Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1 Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2 Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3 Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="19" semihidden="false" unhidewhenused="false" qformat="true" name="Subtle Emphasis"&gt;   &lt;w:lsdexception locked="false" priority="21" semihidden="false" unhidewhenused="false" qformat="true" name="Intense Emphasis"&gt;   &lt;w:lsdexception locked="false" priority="31" semihidden="false" unhidewhenused="false" qformat="true" name="Subtle Reference"&gt;   &lt;w:lsdexception locked="false" priority="32" semihidden="false" unhidewhenused="false" qformat="true" name="Intense Reference"&gt;   &lt;w:lsdexception locked="false" priority="33" semihidden="false" unhidewhenused="false" qformat="true" name="Book Title"&gt;   &lt;w:lsdexception locked="false" priority="37" name="Bibliography"&gt;   &lt;w:lsdexception locked="false" priority="39" qformat="true" name="TOC Heading"&gt;  &lt;/w:LatentStyles&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;!--[if gte mso 10]&gt; &lt;style&gt;  /* Style Definitions */  table.MsoNormalTable  {mso-style-name:"Table Normal";  mso-tstyle-rowband-size:0;  mso-tstyle-colband-size:0;  mso-style-noshow:yes;  mso-style-priority:99;  mso-style-parent:"";  mso-padding-alt:0in 5.4pt 0in 5.4pt;  mso-para-margin-top:0in;  mso-para-margin-right:0in;  mso-para-margin-bottom:10.0pt;  mso-para-margin-left:0in;  line-height:115%;  mso-pagination:widow-orphan;  font-size:11.0pt;  font-family:"Calibri","sans-serif";  mso-ascii-font-family:Calibri;  mso-ascii-theme-font:minor-latin;  mso-hansi-font-family:Calibri;  mso-hansi-theme-font:minor-latin;  mso-bidi-font-family:"Times New Roman";  mso-bidi-theme-font:minor-bidi;} &lt;/style&gt; &lt;![endif]--&gt;  &lt;p class="MsoNormal"  style="margin-bottom: 0.0001pt; line-height: normal;font-family:georgia;"&gt;&lt;span style="font-size:100%;"&gt;          It is relatively well-known that umpires and catchers do their best at keeping a good rapport with one another behind the plate.&lt;span style=""&gt;  &lt;/span&gt;Catchers need to be diplomatic when asking about a call on a given pitch, as umpires may not take well to being called out for an incorrect call.&lt;span style=""&gt;  &lt;/span&gt;The implications of these interactions very well may be important for the catchers’ battery mate standing 60 feet 6 inches away, and the hope is that—if the catcher has any effect on borderline calls—it will be a positive one for his pitcher.&lt;span style=""&gt;  &lt;/span&gt;This is a difficult thing to measure, so I’ll have to leave this for someone else who has access to players and can survey them.&lt;/span&gt;&lt;/p&gt;        &lt;p class="MsoNormal"  style="margin-bottom: 0.0001pt; text-indent: 0.5in; line-height: normal;font-family:georgia;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=""&gt;            &lt;/span&gt;No, today I’ll be asking a related question, though from a different angle: Do umpires give catchers the benefit of the doubt when they’re at the plate?&lt;span style=""&gt;  &lt;/span&gt;If catchers are being cordial with the umpire behind the plate, then this could be a result of both team-level and individual-level incentives.&lt;span style=""&gt;  &lt;/span&gt;If the catcher knows that being nice will improve his experience at bat, then he may have a strong incentive to be nice.&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"  style="margin-bottom: 0.0001pt; text-indent: 0.5in; line-height: normal;font-family:georgia;"&gt;&lt;span style="font-size:100%;"&gt;But there’s a cognitive aspect to this from the umpire perspective as well.&lt;span style=""&gt;  &lt;/span&gt;If an umpire screws up a call for a catcher, then he has to face the guy in a couple of outs right there behind the plate.&lt;span style=""&gt;  &lt;/span&gt;The rest of the team heads off at least 60 feet away, where the umpire won’t be close enough to hold their hand.&lt;span style=""&gt;  &lt;/span&gt;Think about how you might talk to someone you can’t stand on the internet.&lt;span style=""&gt;  &lt;/span&gt;Then think about whether you would say those things to their face.&lt;span style=""&gt;  &lt;/span&gt;Are the two interactions different?&lt;span style=""&gt;  &lt;/span&gt;Could this cognitive aspect come into play when making ball-strike calls when catchers are up to bat?&lt;/span&gt;&lt;/p&gt;    &lt;p class="MsoNormal"  style="margin-bottom: 0.0001pt; line-height: normal;font-family:georgia;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=""&gt;                      &lt;/span&gt;To answer this question, I’ll use Pitch F/X data from 2008 through 2010.&lt;span style=""&gt;  &lt;/span&gt;I include a dummy variable for whether or not the player is a regular catcher, as well as variables giving us information about the distance that the pitch is from the center of the strike zone, whether or not the umpire made the ‘correct call’ (based on a strike zone covering the width of the plate and within a height of 1.75 ft. and 3.45 ft.), and a few other controls.&lt;span style=""&gt;  &lt;/span&gt;Using a few different types of analysis, I’ll do my best to tackle the data.&lt;/span&gt;&lt;/p&gt;    &lt;p class="MsoNormal"  style="margin-bottom: 0.0001pt; line-height: normal;font-family:georgia;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=""&gt;                      &lt;/span&gt;My first step was to run a simple logistic regression on the calls made by the umpire against all batters during this time period.&lt;span style=""&gt;  &lt;/span&gt;In other words, I’ll be predicting the effect of my variables on the probability that a certain pitch is called a strike, holding constant the location, count, batter/pitcher handedness, inning, and so on.&lt;span style=""&gt;  &lt;/span&gt;The dependent variable here is whether or not the pitch was called a strike (1=called strike, 0=not called strike), and the data include only calls made by the umpire.&lt;span style=""&gt;  &lt;/span&gt;&lt;/span&gt;&lt;/p&gt;    &lt;p class="MsoNormal"  style="margin-bottom: 0.0001pt; text-indent: 0.5in; line-height: normal;font-family:georgia;"&gt;&lt;span style="font-size:100%;"&gt;One thing to consider, however, is that catchers could be coming in at a bit of a shorter height than the rest of the batters in the sample.&lt;span style=""&gt;  &lt;/span&gt;Because I used a fixed top and bottom of the strike zone, those pitches at the edges of the top and bottom could be biasing the catcher effect.&lt;span style=""&gt;  &lt;/span&gt;The average MLB roster is about 6’ 1” or 6’ 2” (&lt;a href="http://espn.go.com/mlb/stats/rosters/_/sort/null/order/false"&gt;http://espn.go.com/mlb/stats/rosters/_/sort/null/order/false&lt;/a&gt;), while the average catcher height came in at just under 6’ 1” last season (&lt;a href="http://www.answerbag.com/q_view/2021290"&gt;http://www.answerbag.com/q_view/2021290&lt;/a&gt;).&lt;span style=""&gt;  &lt;/span&gt;If we assume the difference is about 1 inch between catchers and the rest of the league, we should probably account for this.&lt;span style=""&gt;  &lt;/span&gt;Within the data set, for a given pitch, the difference in the strike zone of catchers vs. non-catchers is roughly less than half a vertical inch.&lt;span style=""&gt;  &lt;/span&gt;For the purposes of this preliminary look, I proxy batter height using the listed top and bottom of the zone within the Pitch F/X data.&lt;span style=""&gt;  &lt;/span&gt;While the provided numbers are extremely noisy and problematic for choosing whether or not a pitch was “within the zone”, they should work well as a &lt;b style=""&gt;&lt;i style=""&gt;rough&lt;/i&gt;&lt;/b&gt; proxy for the height of the batter on average.&lt;span style=""&gt;  &lt;/span&gt;This likely won't control enough for height, but running the regression with and without the top and bottom of the zone variables does not really change anything with the catcher variable at all.  This could mean one of two things: 1) Height is not an issue or 2) The sz_top and sz_bot variables aren't just noisy, but completely worthless (a very real possibility).&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"  style="margin-bottom: 0.0001pt; text-indent: 0.5in; line-height: normal;font-family:georgia;"&gt;&lt;span style="font-size:100%;"&gt;One other thing to check is whether or not catchers just  see more pitches within the strike zone defined earlier in this article.&lt;span style=""&gt;  &lt;/span&gt;It  turns out that there is not a statistically significant difference in  the number of pitches seen within the fixed zone for catcher as for  other players.&lt;span style=""&gt;  &lt;/span&gt;This also gives us some slight evidence that the height of catchers isn’t too much of a problem in the model.&lt;span style=""&gt;  &lt;/span&gt;If  catchers are significantly shorter than the rest of the population, we  would expect that pitchers would adjust themselves to throw pitches  within this smaller zone.&lt;span style=""&gt;  &lt;/span&gt;However, the spray of pitch locations is pretty much the same for catchers and non-catchers.&lt;span style=""&gt;  &lt;/span&gt;For brevity, I do not include the results of &lt;span style="font-family: georgia;"&gt;this regression (though, they can be had upon request).&lt;/span&gt; &lt;br /&gt;&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"  style="margin-bottom: 0.0001pt; text-indent: 0.5in; line-height: normal;font-family:georgia;"&gt;&lt;span style="font-size:100%;"&gt;Below is the output from the logistic regression on the probability of a strike call:&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal"  style="margin-bottom: 0.0001pt; line-height: normal;font-family:georgia;"&gt;&lt;span style="font-size:100%;"&gt; &lt;/span&gt;&lt;/p&gt;  &lt;div  align="center" style="font-family:georgia;"&gt;  &lt;table class="MsoNormalTable" style="width: 474.15pt; margin-left: 4.65pt; border-collapse: collapse; border: medium none;" border="1" cellpadding="0" cellspacing="0" width="632"&gt;  &lt;tbody&gt;&lt;tr style="height: 15pt;"&gt;   &lt;td style="width: 245.5pt; border: 1pt solid windowtext; padding: 0in 5.4pt; height: 15pt;" valign="bottom" width="327" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;b style=""&gt;&lt;u&gt;&lt;span style=";color:black;" &gt;Variable:&lt;/span&gt;&lt;/u&gt;&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 63.9pt; border-width: 1pt 1pt 1pt medium; border-style: solid solid solid none; height: 15pt;color:windowtext windowtext windowtext -moz-use-text-color;" valign="bottom" width="85" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;b style=""&gt;&lt;u&gt;&lt;span style=";color:black;" &gt;Estimate&lt;/span&gt;&lt;/u&gt;&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 64.5pt; border-width: 1pt 1pt 1pt medium; border-style: solid solid solid none; height: 15pt;color:windowtext windowtext windowtext -moz-use-text-color;" valign="bottom" width="86" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;b style=""&gt;&lt;u&gt;&lt;span style=";color:black;" &gt;Std. Error&lt;/span&gt;&lt;/u&gt;&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 0.8in; border-width: 1pt 1pt 1pt medium; border-style: solid solid solid none; height: 15pt;color:windowtext windowtext windowtext -moz-use-text-color;" valign="bottom" width="77" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;b style=""&gt;&lt;u&gt;&lt;span style=";color:black;" &gt;z-value&lt;/span&gt;&lt;/u&gt;&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 42.65pt; border-width: 1pt 1pt 1pt medium; border-style: solid solid solid none; height: 15pt;color:windowtext windowtext windowtext -moz-use-text-color;" valign="bottom" width="57" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;b style=""&gt;&lt;u&gt;&lt;span style=";color:black;" &gt;Sig.&lt;/span&gt;&lt;/u&gt;&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;"&gt;   &lt;td  style="width: 245.5pt; border-width: medium 1pt 1pt; border-style: none solid solid; height: 15pt;color:-moz-use-text-color windowtext windowtext;" valign="bottom" width="327" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;(Intercept)&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 63.9pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="85" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;6.115965&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 64.5pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="86" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;0.061387&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 0.8in; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="77" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;99.629&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 42.65pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="57" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;***&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;"&gt;   &lt;td  style="width: 245.5pt; border-width: medium 1pt 1pt; border-style: none solid solid; height: 15pt;color:-moz-use-text-color windowtext windowtext;" valign="bottom" width="327" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;count.0.0&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 63.9pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="85" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;0.501851&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 64.5pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="86" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;0.025818&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 0.8in; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="77" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;19.438&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 42.65pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="57" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;***&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;"&gt;   &lt;td  style="width: 245.5pt; border-width: medium 1pt 1pt; border-style: none solid solid; height: 15pt;color:-moz-use-text-color windowtext windowtext;" valign="bottom" width="327" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;count.0.1&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 63.9pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="85" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;-0.085477&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 64.5pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="86" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;0.027324&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 0.8in; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="77" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;-3.128&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 42.65pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="57" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;**&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;"&gt;   &lt;td  style="width: 245.5pt; border-width: medium 1pt 1pt; border-style: none solid solid; height: 15pt;color:-moz-use-text-color windowtext windowtext;" valign="bottom" width="327" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;count.0.2&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 63.9pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="85" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;-0.466645&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 64.5pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="86" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;0.034037&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 0.8in; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="77" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;-13.71&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 42.65pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="57" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;***&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;"&gt;   &lt;td  style="width: 245.5pt; border-width: medium 1pt 1pt; border-style: none solid solid; height: 15pt;color:-moz-use-text-color windowtext windowtext;" valign="bottom" width="327" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;count.1.0&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 63.9pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="85" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;0.685788&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 64.5pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="86" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;0.026874&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 0.8in; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="77" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;25.519&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 42.65pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="57" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;***&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;"&gt;   &lt;td  style="width: 245.5pt; border-width: medium 1pt 1pt; border-style: none solid solid; height: 15pt;color:-moz-use-text-color windowtext windowtext;" valign="bottom" width="327" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;count.1.1&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 63.9pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="85" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;0.146811&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 64.5pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="86" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;0.027805&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 0.8in; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="77" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;5.28&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 42.65pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="57" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;***&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;"&gt;   &lt;td  style="width: 245.5pt; border-width: medium 1pt 1pt; border-style: none solid solid; height: 15pt;color:-moz-use-text-color windowtext windowtext;" valign="bottom" width="327" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;count.1.2&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 63.9pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="85" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;-0.192775&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 64.5pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="86" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;0.030828&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 0.8in; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="77" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;-6.253&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 42.65pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="57" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;***&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;"&gt;   &lt;td  style="width: 245.5pt; border-width: medium 1pt 1pt; border-style: none solid solid; height: 15pt;color:-moz-use-text-color windowtext windowtext;" valign="bottom" width="327" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;count.2.0&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 63.9pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="85" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;0.877375&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 64.5pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="86" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;0.02952&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 0.8in; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="77" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;29.722&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 42.65pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="57" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;***&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;"&gt;   &lt;td  style="width: 245.5pt; border-width: medium 1pt 1pt; border-style: none solid solid; height: 15pt;color:-moz-use-text-color windowtext windowtext;" valign="bottom" width="327" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;count.2.1&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 63.9pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="85" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;0.351544&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 64.5pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="86" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;0.029845&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 0.8in; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="77" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;11.779&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 42.65pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="57" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;***&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;"&gt;   &lt;td  style="width: 245.5pt; border-width: medium 1pt 1pt; border-style: none solid solid; height: 15pt;color:-moz-use-text-color windowtext windowtext;" valign="bottom" width="327" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;count.2.2&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 63.9pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="85" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;-0.010799&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 64.5pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="86" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;0.031409&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 0.8in; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="77" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;-0.344&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 42.65pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="57" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;0.731&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;"&gt;   &lt;td  style="width: 245.5pt; border-width: medium 1pt 1pt; border-style: none solid solid; height: 15pt;color:-moz-use-text-color windowtext windowtext;" valign="bottom" width="327" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;count.3.0&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 63.9pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="85" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;1.068622&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 64.5pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="86" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;0.033826&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 0.8in; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="77" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;31.592&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 42.65pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="57" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;***&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;"&gt;   &lt;td  style="width: 245.5pt; border-width: medium 1pt 1pt; border-style: none solid solid; height: 15pt;color:-moz-use-text-color windowtext windowtext;" valign="bottom" width="327" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;count.3.1&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 63.9pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="85" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;0.557583&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 64.5pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="86" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;0.03356&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 0.8in; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="77" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;16.614&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 42.65pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="57" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;***&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;"&gt;   &lt;td  style="width: 245.5pt; border-width: medium 1pt 1pt; border-style: none solid solid; height: 15pt;color:-moz-use-text-color windowtext windowtext;" valign="bottom" width="327" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;count.3.2&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 63.9pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="85" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;Base-level&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 64.5pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="86" nowrap="nowrap"&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/td&gt;   &lt;td  style="width: 0.8in; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="77" nowrap="nowrap"&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/td&gt;   &lt;td  style="width: 42.65pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="57" nowrap="nowrap"&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;"&gt;   &lt;td  style="width: 245.5pt; border-width: medium 1pt 1pt; border-style: none solid solid; height: 15pt;color:-moz-use-text-color windowtext windowtext;" valign="bottom" width="327" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;factor(end_outs)=1&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 63.9pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="85" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;0.516444&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 64.5pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="86" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;0.011563&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 0.8in; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="77" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;44.664&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 42.65pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="57" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;***&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;"&gt;   &lt;td  style="width: 245.5pt; border-width: medium 1pt 1pt; border-style: none solid solid; height: 15pt;color:-moz-use-text-color windowtext windowtext;" valign="bottom" width="327" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;factor(end_outs)=2&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 63.9pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="85" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;0.475019&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 64.5pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="86" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;0.011598&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 0.8in; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="77" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;40.956&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 42.65pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="57" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;***&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;"&gt;   &lt;td  style="width: 245.5pt; border-width: medium 1pt 1pt; border-style: none solid solid; height: 15pt;color:-moz-use-text-color windowtext windowtext;" valign="bottom" width="327" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;factor(end_outs)=3&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 63.9pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="85" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;0.815832&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 64.5pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="86" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;0.012411&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 0.8in; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="77" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;65.735&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 42.65pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="57" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;***&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;"&gt;   &lt;td  style="width: 245.5pt; border-width: medium 1pt 1pt; border-style: none solid solid; height: 15pt;color:-moz-use-text-color windowtext windowtext;" valign="bottom" width="327" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;factor(pitcher_throws)=R&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 63.9pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="85" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;0.150937&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 64.5pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="86" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;0.01312&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 0.8in; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="77" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;11.505&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 42.65pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="57" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;***&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;"&gt;   &lt;td  style="width: 245.5pt; border-width: medium 1pt 1pt; border-style: none solid solid; height: 15pt;color:-moz-use-text-color windowtext windowtext;" valign="bottom" width="327" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;factor(batter_stand)=R&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 63.9pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="85" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;-0.259115&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 64.5pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="86" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;0.01416&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 0.8in; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="77" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;-18.299&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 42.65pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="57" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;***&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;"&gt;   &lt;td  style="width: 245.5pt; border-width: medium 1pt 1pt; border-style: none solid solid; height: 15pt;color:-moz-use-text-color windowtext windowtext;" valign="bottom" width="327" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;linear_distance_from_centerpoint&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 63.9pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="85" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;-7.177881&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 64.5pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="86" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;0.01522&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 0.8in; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="77" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;-471.622&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 42.65pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="57" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;***&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;"&gt;   &lt;td  style="width: 245.5pt; border-width: medium 1pt 1pt; border-style: none solid solid; padding: 0in 5.4pt; height: 15pt; color: rgb(0, 0, 0);" valign="bottom" width="327" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;b style=""&gt;&lt;span style=""&gt;catcher&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 63.9pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; padding: 0in 5.4pt; height: 15pt; color: rgb(0, 0, 0);" valign="bottom" width="85" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;b style=""&gt;&lt;span style=""&gt;-0.122086&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 64.5pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; padding: 0in 5.4pt; height: 15pt; color: rgb(0, 0, 0);" valign="bottom" width="86" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;b style=""&gt;&lt;span style=""&gt;0.011084&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 0.8in; border-width: medium 1pt 1pt medium; border-style: none solid solid none; padding: 0in 5.4pt; height: 15pt; color: rgb(0, 0, 0);" valign="bottom" width="77" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;b style=""&gt;&lt;span style=""&gt;-11.015&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 42.65pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; padding: 0in 5.4pt; height: 15pt; color: rgb(0, 0, 0);" valign="bottom" width="57" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;b style=""&gt;&lt;span style=""&gt;***&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;"&gt;   &lt;td  style="width: 245.5pt; border-width: medium 1pt 1pt; border-style: none solid solid; height: 15pt;color:-moz-use-text-color windowtext windowtext;" valign="bottom" width="327" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;pitcher_throws=R &amp;amp;   batter_stand=R&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td  style="width: 63.9pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; height: 15pt;color:-moz-use-text-color windowtext windowtext -moz-use-text-color;" valign="bottom" width="85" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;-0.121216&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td style="width: 64.5pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; padding: 0in 5.4pt; height: 15pt;" valign="bottom" width="86" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;0.016309&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td style="width: 0.8in; border-width: medium 1pt 1pt medium; border-style: none solid solid none; border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; padding: 0in 5.4pt; height: 15pt;" valign="bottom" width="77" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;-7.432&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;   &lt;td style="width: 42.65pt; border-width: medium 1pt 1pt medium; border-style: none solid solid none; border-color: -moz-use-text-color windowtext windowtext -moz-use-text-color; padding: 0in 5.4pt; height: 15pt;" valign="bottom" width="57" nowrap="nowrap"&gt;   &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; line-height: normal;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=";color:black;" &gt;***&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;   &lt;/td&gt;  &lt;/tr&gt; &lt;/tbody&gt;&lt;/table&gt;  &lt;/div&gt;      &lt;p class="MsoNormal"  style="margin-bottom: 0.0001pt; line-height: normal;font-family:georgia;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style=""&gt;                 &lt;/span&gt;The effect in the regression for the ‘catcher’ dummy variable is statistically significant and larger than I would have expected (some of this could be coming from differences in height, despite my attempts at controlling this variable).&lt;span style=""&gt;  &lt;/span&gt;On average, a pitch that is at the edge of the zone (normally a 50-50 change of being called a strike) is &lt;i style="font-weight: bold;"&gt;about&lt;/i&gt;&lt;span style="font-weight: bold;"&gt; 12.5% &lt;/span&gt;&lt;b style=""&gt;&lt;i style=""&gt;less&lt;/i&gt;&lt;/b&gt; likely to be called a strike if the batter is a catcher.  For those unfamiliar with logistic regression, I won’t go into explaining how this changes as the probability of a strike call otherwise increases or decreases.  The estimated effects of coefficients in logistic regression can't just be read off the regression table when pitches are closer to a 1 or 0 probability.  So with catchers, it's likely that a pitch down the middle is still a strike very near the same rate, while a pitch 10 feet high is still a ball at very near the same rate as other batters.&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;    &lt;p class="MsoNormal"  style="margin-bottom: 0.0001pt; text-indent: 0.5in; line-height: normal;font-family:georgia;"&gt;&lt;span style="font-size:100%;"&gt;This is a pretty interesting contrast, but there could be a one other thing confounding the result: I have not controlled for the talent of the batter.&lt;span style=""&gt;  &lt;/span&gt;We know that catchers are generally not as adept at hitting the ball as those at other positions, if for no other reason than the top hitting catchers are often moved to another position early on.&lt;span style=""&gt;  &lt;/span&gt;If umpires are ‘compassionate’ toward players who just aren’t very good hitters, then we could be picking up this effect here.&lt;span style=""&gt;  &lt;/span&gt;I don’t currently have an answer to this issue, as I do not have individual player performance in my Pitch F/X data at this point.&lt;span style=""&gt;  &lt;/span&gt;If it is the case that this is an effect of the umpire interacting with the batter’s skill, then it is also an interesting issue to be looked into later on that likely needs to be controlled for in the data.  I am in the process of greatly improving the information in my Pitch F/X database, so hopefully I can take a look at this stuff as well.&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;  &lt;!--[if !mso]&gt; &lt;style&gt; v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);} &lt;/style&gt; &lt;![endif]--&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;o:officedocumentsettings&gt;   &lt;o:allowpng/&gt;  &lt;/o:OfficeDocumentSettings&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:worddocument&gt;   &lt;w:view&gt;Normal&lt;/w:View&gt;   &lt;w:zoom&gt;0&lt;/w:Zoom&gt;   &lt;w:trackmoves&gt;false&lt;/w:TrackMoves&gt;   &lt;w:trackformatting/&gt;   &lt;w:punctuationkerning/&gt;   &lt;w:validateagainstschemas/&gt;   &lt;w:saveifxmlinvalid&gt;false&lt;/w:SaveIfXMLInvalid&gt;   &lt;w:ignoremixedcontent&gt;false&lt;/w:IgnoreMixedContent&gt;   &lt;w:alwaysshowplaceholdertext&gt;false&lt;/w:AlwaysShowPlaceholderText&gt;   &lt;w:donotpromoteqf/&gt;   &lt;w:lidthemeother&gt;EN-US&lt;/w:LidThemeOther&gt;   &lt;w:lidthemeasian&gt;X-NONE&lt;/w:LidThemeAsian&gt;   &lt;w:lidthemecomplexscript&gt;X-NONE&lt;/w:LidThemeComplexScript&gt;   &lt;w:compatibility&gt;    &lt;w:breakwrappedtables/&gt;    &lt;w:snaptogridincell/&gt;    &lt;w:wraptextwithpunct/&gt;    &lt;w:useasianbreakrules/&gt;    &lt;w:dontgrowautofit/&gt;    &lt;w:splitpgbreakandparamark/&gt;    &lt;w:enableopentypekerning/&gt;    &lt;w:dontflipmirrorindents/&gt;    &lt;w:overridetablestylehps/&gt;   &lt;/w:Compatibility&gt;   &lt;m:mathpr&gt;    &lt;m:mathfont val="Cambria Math"&gt;    &lt;m:brkbin val="before"&gt;    &lt;m:brkbinsub val="&amp;#45;-"&gt;    &lt;m:smallfrac val="off"&gt;    &lt;m:dispdef/&gt;    &lt;m:lmargin val="0"&gt;    &lt;m:rmargin val="0"&gt;    &lt;m:defjc val="centerGroup"&gt;    &lt;m:wrapindent val="1440"&gt;    &lt;m:intlim val="subSup"&gt;    &lt;m:narylim val="undOvr"&gt;   &lt;/m:mathPr&gt;&lt;/w:WordDocument&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:latentstyles deflockedstate="false" defunhidewhenused="true" defsemihidden="true" defqformat="false" defpriority="99" latentstylecount="267"&gt;   &lt;w:lsdexception locked="false" priority="0" semihidden="false" unhidewhenused="false" qformat="true" name="Normal"&gt;   &lt;w:lsdexception locked="false" priority="9" semihidden="false" unhidewhenused="false" qformat="true" name="heading 1"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 2"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 3"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 4"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 5"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 6"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 7"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 8"&gt;   &lt;w:lsdexception locked="false" priority="9" qformat="true" name="heading 9"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 1"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 2"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 3"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 4"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 5"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 6"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 7"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 8"&gt;   &lt;w:lsdexception locked="false" priority="39" name="toc 9"&gt;   &lt;w:lsdexception locked="false" priority="35" qformat="true" name="caption"&gt;   &lt;w:lsdexception locked="false" priority="10" semihidden="false" unhidewhenused="false" qformat="true" name="Title"&gt;   &lt;w:lsdexception locked="false" priority="1" name="Default Paragraph Font"&gt;   &lt;w:lsdexception locked="false" priority="11" semihidden="false" unhidewhenused="false" qformat="true" name="Subtitle"&gt;   &lt;w:lsdexception locked="false" priority="22" semihidden="false" unhidewhenused="false" qformat="true" name="Strong"&gt;   &lt;w:lsdexception locked="false" priority="20" semihidden="false" unhidewhenused="false" qformat="true" name="Emphasis"&gt;   &lt;w:lsdexception locked="false" priority="59" semihidden="false" unhidewhenused="false" name="Table Grid"&gt;   &lt;w:lsdexception locked="false" unhidewhenused="false" name="Placeholder Text"&gt;   &lt;w:lsdexception locked="false" priority="1" semihidden="false" unhidewhenused="false" qformat="true" name="No Spacing"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1 Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2 Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1 Accent 1"&gt;   &lt;w:lsdexception locked="false" unhidewhenused="false" name="Revision"&gt;   &lt;w:lsdexception locked="false" priority="34" semihidden="false" unhidewhenused="false" qformat="true" name="List Paragraph"&gt;   &lt;w:lsdexception locked="false" priority="29" semihidden="false" unhidewhenused="false" qformat="true" name="Quote"&gt;   &lt;w:lsdexception locked="false" priority="30" semihidden="false" unhidewhenused="false" qformat="true" name="Intense Quote"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2 Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1 Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2 Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3 Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid Accent 1"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3 Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid Accent 2"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3 Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid Accent 3"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3 Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid Accent 4"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3 Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid Accent 5"&gt;   &lt;w:lsdexception locked="false" priority="60" semihidden="false" unhidewhenused="false" name="Light Shading Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="61" semihidden="false" unhidewhenused="false" name="Light List Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="62" semihidden="false" unhidewhenused="false" name="Light Grid Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="63" semihidden="false" unhidewhenused="false" name="Medium Shading 1 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="64" semihidden="false" unhidewhenused="false" name="Medium Shading 2 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="65" semihidden="false" unhidewhenused="false" name="Medium List 1 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="66" semihidden="false" unhidewhenused="false" name="Medium List 2 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="67" semihidden="false" unhidewhenused="false" name="Medium Grid 1 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="68" semihidden="false" unhidewhenused="false" name="Medium Grid 2 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="69" semihidden="false" unhidewhenused="false" name="Medium Grid 3 Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="70" semihidden="false" unhidewhenused="false" name="Dark List Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="71" semihidden="false" unhidewhenused="false" name="Colorful Shading Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="72" semihidden="false" unhidewhenused="false" name="Colorful List Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="73" semihidden="false" unhidewhenused="false" name="Colorful Grid Accent 6"&gt;   &lt;w:lsdexception locked="false" priority="19" semihidden="false" unhidewhenused="false" qformat="true" name="Subtle Emphasis"&gt;   &lt;w:lsdexception locked="false" priority="21" semihidden="false" unhidewhenused="false" qformat="true" name="Intense Emphasis"&gt;   &lt;w:lsdexception locked="false" priority="31" semihidden="false" unhidewhenused="false" qformat="true" name="Subtle Reference"&gt;   &lt;w:lsdexception locked="false" priority="32" semihidden="false" unhidewhenused="false" qformat="true" name="Intense Reference"&gt;   &lt;w:lsdexception locked="false" priority="33" semihidden="false" unhidewhenused="false" qformat="true" name="Book Title"&gt;   &lt;w:lsdexception locked="false" priority="37" name="Bibliography"&gt;   &lt;w:lsdexception locked="false" priority="39" qformat="true" name="TOC Heading"&gt;  &lt;/w:LatentStyles&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;!--[if gte mso 10]&gt; &lt;style&gt;  /* Style Definitions */  table.MsoNormalTable  {mso-style-name:"Table Normal";  mso-tstyle-rowband-size:0;  mso-tstyle-colband-size:0;  mso-style-noshow:yes;  mso-style-priority:99;  mso-style-parent:"";  mso-padding-alt:0in 5.4pt 0in 5.4pt;  mso-para-margin-top:0in;  mso-para-margin-right:0in;  mso-para-margin-bottom:10.0pt;  mso-para-margin-left:0in;  line-height:115%;  mso-pagination:widow-orphan;  font-size:11.0pt;  font-family:"Calibri","sans-serif";  mso-ascii-font-family:Calibri;  mso-ascii-theme-font:minor-latin;  mso-hansi-font-family:Calibri;  mso-hansi-theme-font:minor-latin;  mso-bidi-font-family:"Times New Roman";  mso-bidi-theme-font:minor-bidi;} &lt;/style&gt; &lt;![endif]--&gt;  &lt;div class="WordSection1"&gt;      &lt;p class="MsoNormal"  style="margin-bottom: 0.0001pt; text-indent: 0.5in; line-height: normal;font-family:georgia;"&gt;&lt;span style="font-size:100%;"&gt;I took the first model a bit further and followed a technique that J-Doug has used in his fantastic ‘Compassionate Umpire’ articles.&lt;span style=""&gt;  &lt;/span&gt;For this second model, I used an indicator variable of whether the strike was in the batter’s favor (ball within the zone, called a ball), the pitcher’s favor (call outside the zone, called a strike), or neutral (“correct call”).&lt;span style=""&gt;  &lt;/span&gt;This is the dependent variable in an ordered logistic model.&lt;span style=""&gt;  &lt;/span&gt;As the indicator increases (from -1 to 0 to 1), the calls are coming more “in favor” of the batter.&lt;span style=""&gt;  &lt;/span&gt;This sheds some further light on any increase in probability that the ball will be in the batters’ favor, given that he is a catcher.&lt;span style=""&gt;  &lt;/span&gt;I again don’t present the full regression output as they simply confirm the earlier finding; however, this model also indicated a significant increase in favorable calls for catchers.&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"  style="margin-bottom: 0.0001pt; text-indent: 0.5in; line-height: normal;font-family:georgia;"&gt;&lt;span style="font-size:100%;"&gt;So, where is this difference in strike-calling coming from?&lt;span style=""&gt;  &lt;/span&gt;Well, looking at the contour for the 50% call rate for left handed and right handed batters, we can see below.&lt;span style=""&gt;  &lt;/span&gt;In the panel on the left, I plotted the 50% contours for RHB that are catchers and non-catchers, while on the right panel, we have left-handed batters (plots are from the umpire’s view).&lt;span style=""&gt;  &lt;/span&gt;In both panes, you can see that umpires are a bit more lenient with inside pitches for both right and left-handed batting catchers.&lt;span style=""&gt;  &lt;/span&gt;Right-handed catchers seem to get calls in their favor up in the zone, but this very well could be a result of these catchers being a little shorter than their non-catching counterparts.&lt;span style=""&gt;  &lt;/span&gt;You can see that for left-handed catchers, the zone is shifted upward a bit.&lt;span style=""&gt;  &lt;/span&gt;So the ‘height’ factor seems to be relatively ambiguous compared to the inside corner difference, especially considering that the lower limit of the zone for RHB is almost identical for both groups of batters.&lt;span style=""&gt;  &lt;/span&gt;&lt;span style="font-weight: bold;"&gt;A word of caution&lt;/span&gt;: comparing differences this small on plots like this is not a replacement for more rigorous analysis, but they are interesting to look at once we understand some of what is going on in the data.&lt;/span&gt;&lt;/p&gt;  &lt;/div&gt;  &lt;span style="font-family:georgia;font-size:100%;"&gt;  &lt;/span&gt;  &lt;div face="georgia" class="WordSection2"&gt;  &lt;p class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center; line-height: normal;" align="center"&gt;&lt;a href="http://2.bp.blogspot.com/-YjwvVAoOAZg/TZ9Zqt9yblI/AAAAAAAAASA/i35vdMhTpm4/s1600/zonecompare.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 248px;" src="http://2.bp.blogspot.com/-YjwvVAoOAZg/TZ9Zqt9yblI/AAAAAAAAASA/i35vdMhTpm4/s400/zonecompare.png" alt="" id="BLOGGER_PHOTO_ID_5593287852645576274" border="0" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;/div&gt;  &lt;span style="font-family: georgia;font-size:100%;" &gt;  &lt;/span&gt;  &lt;p class="MsoNormal"  style="margin-bottom: 0.0001pt; text-indent: 0.5in; line-height: normal;font-family:georgia;"&gt;&lt;span style="font-size:100%;"&gt;This is of course &lt;span style="font-style: italic;"&gt;not certain evidence &lt;/span&gt;that there is something going on with the catchers at bat, but it seems to point to something interesting.&lt;span style=""&gt;  &lt;/span&gt;&lt;span style="font-weight: bold;"&gt;I’d like to look into this phenomenon (if that is what it is) and be a bit more confident about the height of the batters and the possibility of umpires being ‘compassionate’ toward less skilled hitters, rather than catchers themselves.  &lt;/span&gt;If anyone has batter height available by player_id (the ones included in the Pitch F/X database format from Mike Fast's tutorial), I'd love to be able to include this in my data.  That way, I could try and provide a bit more accurate umpire performance estimations as well.&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"  style="margin-bottom: 0.0001pt; text-indent: 0.5in; line-height: normal;font-family:georgia;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style="font-size:12pt;"&gt;&lt;span style="font-size:100%;"&gt;In the end, it very well could be that the closeness of the catcher and the umpire has an effect on the umpire taking the bat out of a catcher’s hands.&lt;/span&gt;&lt;span style=";font-size:100%;" &gt;  &lt;/span&gt;&lt;span style="font-size:100%;"&gt;But replication and improvement is always key in this sort of analysis, and I think it is needed here.&lt;/span&gt;&lt;span style=";font-size:100%;" &gt;  &lt;/span&gt;&lt;span style="font-size:100%;"&gt;I’d love to hear some reactions to the analysis, and am always willing to hear shortcomings of the approach here.  I have a hard time coming out and proclaiming a definite bias without better controlling for the height of the batters, but the effect seems to be large enough that at least some of it is real.&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-4957937929367482309?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/4957937929367482309/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/04/umpire-bias-favoring-catchers-at-bat.html#comment-form' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/4957937929367482309'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/4957937929367482309'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/04/umpire-bias-favoring-catchers-at-bat.html' title='Umpire Bias Favoring Catchers At Bat?'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-YjwvVAoOAZg/TZ9Zqt9yblI/AAAAAAAAASA/i35vdMhTpm4/s72-c/zonecompare.png' height='72' width='72'/><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-8269086697084567209</id><published>2011-04-08T13:29:00.003-04:00</published><updated>2011-04-08T13:33:41.061-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Sabermetrics'/><category scheme='http://www.blogger.com/atom/ns#' term='Baseball Prospectus'/><category scheme='http://www.blogger.com/atom/ns#' term='Statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='Links'/><title type='text'>Baseball ProGUESTus</title><content type='html'>&lt;a href="http://www.baseballprospectus.com/article.php?articleid=13501"&gt;I have an article up at Baseball Prospectus in their Baseball ProGUESTus column.&lt;/a&gt;  One of two projects I've been working on lately.  I was curious if there was really any difference in how prospects are treated from the rest of the population of players.&lt;br /&gt;&lt;br /&gt;Using 2010 MLB debuts vs. the rest of the field, my general finding is that pitchers have a pretty good scouting report early on for players.  The approach doesn't change much throughout that first season, bringing to question articles claiming that a young guy got suddenly "figured out" after a hot start to the season.  The analysis is pretty macro-level, so that's not to say this doesn't happen.  And I only used 2010 data for now.  But in the end, the groups of hitter types that a younger player falls into is pretty much known when he reaches the Big Leagues.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-8269086697084567209?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/8269086697084567209/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/04/baseball-proguestus.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/8269086697084567209'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/8269086697084567209'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/04/baseball-proguestus.html' title='Baseball ProGUESTus'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-1400589602617676965</id><published>2011-04-01T15:09:00.008-04:00</published><updated>2011-04-01T15:28:01.757-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Umpires'/><category scheme='http://www.blogger.com/atom/ns#' term='Data'/><category scheme='http://www.blogger.com/atom/ns#' term='Pitch F/X'/><title type='text'>Umpire Call Database</title><content type='html'>Okay, so after some talks with Mike Fast about the strike zone and some suggestions/requests for certain types of data from MGL, I've got an updated file with a complete description of umpire calls.  Keep in mind the following:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;1. &lt;/span&gt;At the suggestion of Mike, I used 1.75 feet as the bottom of the zone and 3.45 feet for the top of the zone.  This is likely more inaccurate than basing the zone off the height of each batter.  However, I don't currently have batter height integrated into my Pitch F/X data.  So in general, umpires are probably slightly more accurate than my numbers if they truly base their zone off the height of the batter.  Also note that I use the rulebook zone (width of the plate), rather than the 2-foot wide zone.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;2. &lt;/span&gt;All tabulations are only from pitches that include FX location data with them.  There are pitches here and there that didn't register with the FX system, and they are of course not included.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;3. &lt;/span&gt;In the data file, the last 12 columns in each worksheet should be of most interest to people.  Some notes on this:&lt;br /&gt;&lt;br /&gt;a. The Green and Red indicate "Correct" and "Incorrect" calls, respectively based on the     rulebook zone.&lt;br /&gt;&lt;br /&gt;b. The Sensitivity is the umpire's percentage of Within Zone pitches that are correctly called Strikes.  The Specificity is the umpire's percentage of Outside Zone pitches that are correctly called Balls.&lt;br /&gt;&lt;br /&gt;c. There is a variable key included within the file.  Read it and understand it before snooping too much into the data.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;4. I make no guarantees as to the accuracy of the calculations. &lt;/span&gt; In general, these are rough estimates of umpire performance on Ball and Strike calls.  Because of the fixed top and bottom of the zone, they should be taken with some caution.  I did my best to ensure that everything is correct.  There are other ways to do this, including using the 2-foot wide zone and varying the top and bottom of the zone.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:180%;"&gt;&lt;a href="http://sitemaker.umich.edu/millsbrian/files/umpire_crosstabs_and_summary.xlsx"&gt;Without further adieu, here is the data.&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;With that said, enjoy!  If you use this data anywhere, I always appreciate a cite or a link back here.  If you are using the data for your own personal use, I'd love to hear what you will be using it for.  If you have any questions, leave them in the comments or feel free to shoot me an email.  But be sure to read the variable key and everything first.&lt;br /&gt;&lt;br /&gt;As always, definitely let me know if something in the data looks funky.  I tried to make the column names as logical as possible, but I'm sure others will disagree.  The key has explicit descriptions of everything.&lt;span style="font-size:180%;"&gt;&lt;a href="http://sitemaker.umich.edu/millsbrian/files/umpire_crosstabs_and_summary.xlsx"&gt;&lt;br /&gt;&lt;/a&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-1400589602617676965?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/1400589602617676965/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/04/umpire-call-database.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/1400589602617676965'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/1400589602617676965'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/04/umpire-call-database.html' title='Umpire Call Database'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-5546498339230759510</id><published>2011-03-31T17:15:00.004-04:00</published><updated>2011-03-31T17:31:55.741-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='League Rules'/><category scheme='http://www.blogger.com/atom/ns#' term='Fantays'/><title type='text'>Gauging Interest: 2012 Keeper League</title><content type='html'>So I know it's a bit early--and I'm probably nuts to start up a new keeper league as commissioner in the last semester of my dissertation writing next year--but this is something I've wanted to do for a while.&lt;br /&gt;&lt;br /&gt;BUT, I'd love to gauge some interest in a keeper league that will start up in 2012.  I'm looking for very serious players who are willing to invest about $350 into the league each year (don't worry, if you have the worst team in the league, it's likely you won't lose more than $120).  The reason the fee is so high is that you pay your own auction salary.&lt;br /&gt;&lt;br /&gt;As of now, I'm envisioning this as an 8x8 Head-to-Head Each Category league with a $300 cap and minor league players.  6 Keepers per year (plus minor league keepers), Rule 5 drafts, contracts, the works.  Sessions will be 2 weeks long, rather than one, and there will be playoffs.  20 teams.  You get paid a certain amount for each category win throughout the season (so you're paid marginally based on overall record). &lt;br /&gt;&lt;a href="http://sitemaker.umich.edu/millsbrian/files/the_dynasty_league_rules.docx"&gt;&lt;br /&gt;I have linked my current league constitution here.&lt;/a&gt; (Warning: It's a doozie.  8,500 words on 16 pages).&lt;br /&gt;&lt;br /&gt;I'd do things through League Safe, assuming they can handle the complex payout structure.  Otherwise, I'll figure something out.  The rules are currently not particularly negotiable (with a couple exceptions).  Why?  Because this constitution is based off a 6-year running league that has run into numerous problems.  Each rule in the constitution is there for a specific and important reason that I have had experience with.  That doesn't mean that I won't entertain suggestions, but once the league starts, rules cannot be revisited until the next offseason.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Anyone interested please stick your email in the comments with your fantasy experience, expertise, or current field of employment (I'd like to get a mix of strategies) and/or shoot me an email (bmmillsy at umich dot edu).  &lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-5546498339230759510?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/5546498339230759510/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/03/gauging-interest-2012-keeper-league.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/5546498339230759510'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/5546498339230759510'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/03/gauging-interest-2012-keeper-league.html' title='Gauging Interest: 2012 Keeper League'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-7404216571064616374</id><published>2011-03-31T11:03:00.007-04:00</published><updated>2011-03-31T12:00:44.268-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Measurement Error'/><category scheme='http://www.blogger.com/atom/ns#' term='Data Issues'/><category scheme='http://www.blogger.com/atom/ns#' term='Pitch F/X'/><title type='text'>Data Quality in Pitch F/X</title><content type='html'>This post stems from a discussion with Mike Fast about quality of the "sz_top" and "sz_bot" variables in Pitch F/X data.  I had been using these to designate the strike zone for my calculations in past posts.  I want to thank Mike for being generous with his time to answer some of my questions and keep me from publicly writing something stupid.&lt;br /&gt;&lt;br /&gt;I was aware that the lines drawn for Top and Bottom of the zone were somewhat inaccurate.  However, one thing I did not count on would be that this variation would systematically bias findings in the data across years.  As a whole, we would normally expect that these measurement errors are random (for a given player, not across players).  In theory, random measurement errors are totally fine.  While they make the data noisy, they should not bias our measurements and with really big data, they should be mostly ignorable when we do certain calculations.&lt;br /&gt;&lt;br /&gt;But over time, this just doesn't seem to be the case.  This is the main reason I took down the data from my last post (I'll update it as soon as I can and repost it).  The inaccuracy of the data tends to stem from the correlation between the zone designation at the top and bottom, and the percentage of pitches WITHIN the zone also called strikes.  That's no surprise, and normally I wouldn't worry too much about this as we'd expect it's simply noise and we'd expect some uniform change inside and outside the zone if we change the size of the strike zone.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;However&lt;/span&gt;, the interesting part is that it seems to have a minimal effect--if any--on the pitches correctly called Balls that are actually outside the zone.  I'm still not sure why this is the case.  We'd expect that fixing the zone would similarly affect the percentage of correctly called pitches both within and outside the zone (after all, any that are no longer 'outside' the zone MUST be 'within' the zone--though less on the 'outside zone' data because there are more pitches outside the zone than within the zone).  The only thing I can think of is that it's a sample size issue: there are many more pitches outside the rulebook zone than inside the zone (just under 3 times as many).  But I can't imagine this accounts for such a huge change in one and almost no change in the other.&lt;br /&gt;&lt;br /&gt;With that said, I thought I would provide some data for those looking to mess with these variables in the Pitch F/X data.  &lt;a href="http://sitemaker.umich.edu/millsbrian/files/zone_changes_by_player.xlsx"&gt;In the file linked here, I have calculated the average Top and Bottom of zone for each player in each year, along with the standard deviation.&lt;/a&gt;  The data are in both feet and in inches.  Below, I also show the range of values for sz_top for Bobby Abreu in 2007, 2009 and 2010 (I skip 2008 for now).  Finally, I give a distribution of standard deviations for the measurement by player from 2007 through 2010.  From the looks of things, something was changed in mid-2007 about how they designated the top of the zone (notice the bimodal distribution).&lt;br /&gt;&lt;br /&gt;Anyway, just a heads up.  Like I said, I'm still not clear on why this is systematically changing the Within Zone tabulations but NOT the Outside Zone tabulations.  I'll post the file once I figure out what is going on.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://4.bp.blogspot.com/-RdHbfNceeGc/TZSkZjQDcqI/AAAAAAAAAR4/8MTLreVkZBY/s1600/all.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 400px;" src="http://4.bp.blogspot.com/-RdHbfNceeGc/TZSkZjQDcqI/AAAAAAAAAR4/8MTLreVkZBY/s400/all.png" alt="" id="BLOGGER_PHOTO_ID_5590273796339888802" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://1.bp.blogspot.com/-BXdJkZF7rzA/TZSkZXkpfwI/AAAAAAAAARw/W2s6FX1ZZes/s1600/abreusztop.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 333px;" src="http://1.bp.blogspot.com/-BXdJkZF7rzA/TZSkZXkpfwI/AAAAAAAAARw/W2s6FX1ZZes/s400/abreusztop.png" alt="" id="BLOGGER_PHOTO_ID_5590273793205042946" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-7404216571064616374?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/7404216571064616374/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/03/data-quality.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/7404216571064616374'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/7404216571064616374'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/03/data-quality.html' title='Data Quality in Pitch F/X'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-RdHbfNceeGc/TZSkZjQDcqI/AAAAAAAAAR4/8MTLreVkZBY/s72-c/all.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-2137468656067427707</id><published>2011-03-26T11:41:00.007-04:00</published><updated>2011-04-01T15:27:25.922-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Strike Zones'/><category scheme='http://www.blogger.com/atom/ns#' term='Baseball'/><category scheme='http://www.blogger.com/atom/ns#' term='Umpires'/><category scheme='http://www.blogger.com/atom/ns#' term='Pitch F/X'/><title type='text'>Umpire Strike Call Percent, In and Out of Rulebook Zone</title><content type='html'>I've finished up some preliminary tabulations for umpire calls within and outside the rulebook zone.  Because it's a fairly large table, I'm not going to present it directly on this page.  However, you can download the file here.&lt;br /&gt;&lt;br /&gt;Keep in mind that these are based on the rulebook zone.  The numbers say nothing but how well the umpires conform to their own zone.  Umpires tend to have their own zone, which are likely well-known by the players in the game.  Some zones are shifted outside, most stretch a bit beyond the edges of the plate, and so on.  Combining these numbers with the visuals in my previous post are your best bet for understanding where the "Incorrect" calls are coming from.  Most likely, these are just outside the book zone but within the 2-foot wide zone.&lt;br /&gt;&lt;br /&gt;We really don't know WHY the zone tends to extend beyond the plate for umpires (well, maybe someone does, I don't know though).  One suggestion is that Pitch F/X measures the center of the ball, so there is 1.5 inch worth of ball on either side.  That extends the zone 1.5 inches beyond the plate on each side assuming an umpire calls a strike if ANY portion of the ball touches the black.&lt;br /&gt;&lt;br /&gt;The rest of it could simply be a perception issue.  The ump looks from the center of the plate toward the outside of the plate.  Because anything on the corners is viewed at an angle, the umpire makes some sort of guess based on visual cues as to whether it went directly over the plate (they don't have a perfect bird's eye view of every pitch).  The question then becomes: how should we evaluate them?  The book zone, or a predictable zone for each umpire?  I'll leave that question for another day and for someone else to answer.&lt;br /&gt;&lt;br /&gt;There is a separate worksheet for each year (2007 to 2010).  I might add some other info in the next couple days to the file like # of pitches within and outside the zone, and counts of each designation in the file.  There are some extra umpires in the file with no data, and many of these are guys that got some assignments in spring training.  The data should be only regular season games.  Lastly, keep in mind that the correctly called balls and incorrectly called strikes do not add up to 100%.  This is because I left out Pitch Outs and Intentional Balls from the cross-tabulations.&lt;br /&gt;&lt;br /&gt;If you use them for any write ups anywhere, I appreciate a cite or link back.  At the least just let me know, because I'd like to see what people do with the data just for my own curiosity.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;UPDATE&lt;/span&gt;: &lt;a href="http://tinyurl.com/3jpuvqm"&gt;DATA IS NOW AVAILABLE HERE.&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-2137468656067427707?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/2137468656067427707/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/03/umpire-strike-call-percent-in-and-out.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/2137468656067427707'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/2137468656067427707'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/03/umpire-strike-call-percent-in-and-out.html' title='Umpire Strike Call Percent, In and Out of Rulebook Zone'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-6764632586375702959</id><published>2011-03-24T17:46:00.010-04:00</published><updated>2011-04-04T16:09:29.339-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Sabermetrics'/><category scheme='http://www.blogger.com/atom/ns#' term='Baseball'/><category scheme='http://www.blogger.com/atom/ns#' term='Umpires'/><category scheme='http://www.blogger.com/atom/ns#' term='Pitch F/X'/><title type='text'>Umpire Strike Zones</title><content type='html'>Recently, I've been working on a new post for FBJ.  Hopefully, that will be ready to go tomorrow, but with the publication of Jeff Zimmerman's umpire projections today, I thought I'd post some stuff here.  Jeff makes some cool plots and has some cross-tabulations of umpire strike call percentage for a few years.  However, it seems like something went wrong.  &lt;a href="http://www.fangraphs.com/blogs/index.php/2011-umpire-projections/#comment-618466"&gt;If you're curious, go over there and also check out the comments.&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Below is a quick table of umpires that were behind the plate for at least 5,000 plate appearances from 2007 through 2010 (for which Pitch F/X data is available).  From the looks of things, the umpire can have just over a two-run effect on the outcome of the game due to his strike zone  (&lt;span style="font-weight: bold; font-style: italic;"&gt;ADDENDUM: MGL correctly points out in the comments that my language is imprecise, and the assumption that the noise is evened out is too strong.  I agree he is correct.  I should have said that the difference in the data is a bit over 2 runs, NOT that the EFFECT was a little over 2 runs.  His suggestion is that the effect is about 0.6 runs.  I'll see what other info I can get out of the data.&lt;/span&gt;).   Of course, we're assuming that umpires are randomly assigned and that the quality of the pitching and hitting evens out over the 5,000 plate appearances, which is a pretty strong assumption.  But even if the range of the effect was only a single run, I think this would be pretty significant.  The data below is for 2007 through 2010.&lt;br /&gt;&lt;br /&gt;&lt;table border="0" cellpadding="0" cellspacing="0" width="779"&gt;&lt;col style="width: 95pt;" width="126"&gt;  &lt;col style="width: 92pt;" width="123"&gt;  &lt;col style="width: 38pt;" width="50"&gt;  &lt;col style="width: 32pt;" width="42"&gt;  &lt;col style="width: 63pt;" width="84" span="4"&gt;  &lt;col style="width: 77pt;" width="102"&gt;  &lt;tbody&gt;&lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl66" style="height: 15pt; width: 95pt;" width="126" height="20"&gt;Umpire   First Name&lt;/td&gt;   &lt;td class="xl66" style="width: 92pt;" width="123"&gt;Umpire Last Name&lt;/td&gt;   &lt;td class="xl66" style="width: 38pt;" width="50"&gt;Games&lt;/td&gt;   &lt;td class="xl66" style="width: 32pt;" width="42"&gt;PA&lt;/td&gt;   &lt;td class="xl66" style="width: 63pt;" width="84"&gt;Strikeout %&lt;/td&gt;   &lt;td class="xl66" style="width: 63pt;" width="84"&gt;OBP&lt;/td&gt;   &lt;td class="xl66" style="width: 63pt;" width="84"&gt;SLG&lt;/td&gt;   &lt;td class="xl66" style="width: 63pt;" width="84"&gt;AVG&lt;/td&gt;   &lt;td class="xl66" style="width: 77pt;" width="102"&gt;Runs Per Game&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Jerry&lt;/td&gt;   &lt;td class="xl67"&gt;Crawford&lt;/td&gt;   &lt;td class="xl67"&gt;87&lt;/td&gt;   &lt;td class="xl67"&gt;6834&lt;/td&gt;   &lt;td class="xl65"&gt; 16.56%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3459&lt;/td&gt;   &lt;td class="xl68"&gt;0.4268&lt;/td&gt;   &lt;td class="xl68"&gt;0.2639&lt;/td&gt;   &lt;td class="xl69"&gt;10.17&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Angel&lt;/td&gt;   &lt;td class="xl67"&gt;Campos&lt;/td&gt;   &lt;td class="xl67"&gt;84&lt;/td&gt;   &lt;td class="xl67"&gt;6466&lt;/td&gt;   &lt;td class="xl65"&gt; 18.33%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3361&lt;/td&gt;   &lt;td class="xl68"&gt;0.4191&lt;/td&gt;   &lt;td class="xl68"&gt;0.2658&lt;/td&gt;   &lt;td class="xl69"&gt;9.92&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Gerry&lt;/td&gt;   &lt;td class="xl67"&gt;Davis&lt;/td&gt;   &lt;td class="xl67"&gt;140&lt;/td&gt;   &lt;td class="xl67"&gt;10822&lt;/td&gt;   &lt;td class="xl65"&gt; 16.68%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3354&lt;/td&gt;   &lt;td class="xl68"&gt;0.4250&lt;/td&gt;   &lt;td class="xl68"&gt;0.2635&lt;/td&gt;   &lt;td class="xl69"&gt;9.89&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Tim&lt;/td&gt;   &lt;td class="xl67"&gt;Welke&lt;/td&gt;   &lt;td class="xl67"&gt;127&lt;/td&gt;   &lt;td class="xl67"&gt;9755&lt;/td&gt;   &lt;td class="xl65"&gt; 18.60%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3324&lt;/td&gt;   &lt;td class="xl68"&gt;0.4215&lt;/td&gt;   &lt;td class="xl68"&gt;0.2636&lt;/td&gt;   &lt;td class="xl69"&gt;9.83&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Chad&lt;/td&gt;   &lt;td class="xl67"&gt;Fairchild&lt;/td&gt;   &lt;td class="xl67"&gt;131&lt;/td&gt;   &lt;td class="xl67"&gt;10262&lt;/td&gt;   &lt;td class="xl65"&gt; 18.01%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3343&lt;/td&gt;   &lt;td class="xl68"&gt;0.4181&lt;/td&gt;   &lt;td class="xl68"&gt;0.2617&lt;/td&gt;   &lt;td class="xl69"&gt;9.82&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Jim&lt;/td&gt;   &lt;td class="xl67"&gt;Reynolds&lt;/td&gt;   &lt;td class="xl67"&gt;130&lt;/td&gt;   &lt;td class="xl67"&gt;10079&lt;/td&gt;   &lt;td class="xl65"&gt; 18.25%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3372&lt;/td&gt;   &lt;td class="xl68"&gt;0.4234&lt;/td&gt;   &lt;td class="xl68"&gt;0.2690&lt;/td&gt;   &lt;td class="xl69"&gt;9.74&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Tim&lt;/td&gt;   &lt;td class="xl67"&gt;McClelland&lt;/td&gt;   &lt;td class="xl67"&gt;144&lt;/td&gt;   &lt;td class="xl67"&gt;11090&lt;/td&gt;   &lt;td class="xl65"&gt; 16.47%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3418&lt;/td&gt;   &lt;td class="xl68"&gt;0.4168&lt;/td&gt;   &lt;td class="xl68"&gt;0.2660&lt;/td&gt;   &lt;td class="xl69"&gt;9.72&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Tim&lt;/td&gt;   &lt;td class="xl67"&gt;Tschida&lt;/td&gt;   &lt;td class="xl67"&gt;135&lt;/td&gt;   &lt;td class="xl67"&gt;10528&lt;/td&gt;   &lt;td class="xl65"&gt; 17.31%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3413&lt;/td&gt;   &lt;td class="xl68"&gt;0.4167&lt;/td&gt;   &lt;td class="xl68"&gt;0.2678&lt;/td&gt;   &lt;td class="xl69"&gt;9.69&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Larry&lt;/td&gt;   &lt;td class="xl67"&gt;Vanover&lt;/td&gt;   &lt;td class="xl67"&gt;133&lt;/td&gt;   &lt;td class="xl67"&gt;10202&lt;/td&gt;   &lt;td class="xl65"&gt; 17.70%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3312&lt;/td&gt;   &lt;td class="xl68"&gt;0.4153&lt;/td&gt;   &lt;td class="xl68"&gt;0.2617&lt;/td&gt;   &lt;td class="xl69"&gt;9.68&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Sam&lt;/td&gt;   &lt;td class="xl67"&gt;Holbrook&lt;/td&gt;   &lt;td class="xl67"&gt;139&lt;/td&gt;   &lt;td class="xl67"&gt;10618&lt;/td&gt;   &lt;td class="xl65"&gt; 17.48%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3345&lt;/td&gt;   &lt;td class="xl68"&gt;0.4280&lt;/td&gt;   &lt;td class="xl68"&gt;0.2628&lt;/td&gt;   &lt;td class="xl69"&gt;9.68&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Bill&lt;/td&gt;   &lt;td class="xl67"&gt;Welke&lt;/td&gt;   &lt;td class="xl67"&gt;132&lt;/td&gt;   &lt;td class="xl67"&gt;10248&lt;/td&gt;   &lt;td class="xl65"&gt; 17.94%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3357&lt;/td&gt;   &lt;td class="xl68"&gt;0.4153&lt;/td&gt;   &lt;td class="xl68"&gt;0.2690&lt;/td&gt;   &lt;td class="xl69"&gt;9.64&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Mike&lt;/td&gt;   &lt;td class="xl67"&gt;Reilly&lt;/td&gt;   &lt;td class="xl67"&gt;138&lt;/td&gt;   &lt;td class="xl67"&gt;10705&lt;/td&gt;   &lt;td class="xl65"&gt; 17.91%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3410&lt;/td&gt;   &lt;td class="xl68"&gt;0.4241&lt;/td&gt;   &lt;td class="xl68"&gt;0.2666&lt;/td&gt;   &lt;td class="xl69"&gt;9.62&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Randy&lt;/td&gt;   &lt;td class="xl67"&gt;Marsh&lt;/td&gt;   &lt;td class="xl67"&gt;93&lt;/td&gt;   &lt;td class="xl67"&gt;7050&lt;/td&gt;   &lt;td class="xl65"&gt; 15.26%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3435&lt;/td&gt;   &lt;td class="xl68"&gt;0.4173&lt;/td&gt;   &lt;td class="xl68"&gt;0.2671&lt;/td&gt;   &lt;td class="xl69"&gt;9.52&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Alfonso&lt;/td&gt;   &lt;td class="xl67"&gt;Marquez&lt;/td&gt;   &lt;td class="xl67"&gt;103&lt;/td&gt;   &lt;td class="xl67"&gt;8103&lt;/td&gt;   &lt;td class="xl65"&gt; 16.46%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3380&lt;/td&gt;   &lt;td class="xl68"&gt;0.4093&lt;/td&gt;   &lt;td class="xl68"&gt;0.2609&lt;/td&gt;   &lt;td class="xl69"&gt;9.50&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Scott&lt;/td&gt;   &lt;td class="xl67"&gt;Barry&lt;/td&gt;   &lt;td class="xl67"&gt;110&lt;/td&gt;   &lt;td class="xl67"&gt;8366&lt;/td&gt;   &lt;td class="xl65"&gt; 16.91%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3365&lt;/td&gt;   &lt;td class="xl68"&gt;0.4206&lt;/td&gt;   &lt;td class="xl68"&gt;0.2608&lt;/td&gt;   &lt;td class="xl69"&gt;9.48&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Tim&lt;/td&gt;   &lt;td class="xl67"&gt;Timmons&lt;/td&gt;   &lt;td class="xl67"&gt;134&lt;/td&gt;   &lt;td class="xl67"&gt;10349&lt;/td&gt;   &lt;td class="xl65"&gt; 17.61%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3314&lt;/td&gt;   &lt;td class="xl68"&gt;0.4173&lt;/td&gt;   &lt;td class="xl68"&gt;0.2650&lt;/td&gt;   &lt;td class="xl69"&gt;9.48&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Paul&lt;/td&gt;   &lt;td class="xl67"&gt;Schrieber&lt;/td&gt;   &lt;td class="xl67"&gt;110&lt;/td&gt;   &lt;td class="xl67"&gt;8678&lt;/td&gt;   &lt;td class="xl65"&gt; 16.73%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3450&lt;/td&gt;   &lt;td class="xl68"&gt;0.4100&lt;/td&gt;   &lt;td class="xl68"&gt;0.2610&lt;/td&gt;   &lt;td class="xl69"&gt;9.46&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Brian&lt;/td&gt;   &lt;td class="xl67"&gt;Knight&lt;/td&gt;   &lt;td class="xl67"&gt;128&lt;/td&gt;   &lt;td class="xl67"&gt;9760&lt;/td&gt;   &lt;td class="xl65"&gt; 17.01%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3368&lt;/td&gt;   &lt;td class="xl68"&gt;0.4200&lt;/td&gt;   &lt;td class="xl68"&gt;0.2646&lt;/td&gt;   &lt;td class="xl69"&gt;9.46&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Jerry&lt;/td&gt;   &lt;td class="xl67"&gt;Meals&lt;/td&gt;   &lt;td class="xl67"&gt;138&lt;/td&gt;   &lt;td class="xl67"&gt;10596&lt;/td&gt;   &lt;td class="xl65"&gt; 17.57%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3322&lt;/td&gt;   &lt;td class="xl68"&gt;0.4190&lt;/td&gt;   &lt;td class="xl68"&gt;0.2617&lt;/td&gt;   &lt;td class="xl69"&gt;9.44&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Adrian&lt;/td&gt;   &lt;td class="xl67"&gt;Johnson&lt;/td&gt;   &lt;td class="xl67"&gt;120&lt;/td&gt;   &lt;td class="xl67"&gt;9338&lt;/td&gt;   &lt;td class="xl65"&gt; 17.56%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3376&lt;/td&gt;   &lt;td class="xl68"&gt;0.4147&lt;/td&gt;   &lt;td class="xl68"&gt;0.2601&lt;/td&gt;   &lt;td class="xl69"&gt;9.39&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Dana&lt;/td&gt;   &lt;td class="xl67"&gt;DeMuth&lt;/td&gt;   &lt;td class="xl67"&gt;141&lt;/td&gt;   &lt;td class="xl67"&gt;10871&lt;/td&gt;   &lt;td class="xl65"&gt; 17.66%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3330&lt;/td&gt;   &lt;td class="xl68"&gt;0.4060&lt;/td&gt;   &lt;td class="xl68"&gt;0.2599&lt;/td&gt;   &lt;td class="xl69"&gt;9.38&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Brian&lt;/td&gt;   &lt;td class="xl67"&gt;Gorman&lt;/td&gt;   &lt;td class="xl67"&gt;139&lt;/td&gt;   &lt;td class="xl67"&gt;10599&lt;/td&gt;   &lt;td class="xl65"&gt; 17.81%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3312&lt;/td&gt;   &lt;td class="xl68"&gt;0.4233&lt;/td&gt;   &lt;td class="xl68"&gt;0.2657&lt;/td&gt;   &lt;td class="xl69"&gt;9.37&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;CB&lt;/td&gt;   &lt;td class="xl67"&gt;Bucknor&lt;/td&gt;   &lt;td class="xl67"&gt;138&lt;/td&gt;   &lt;td class="xl67"&gt;10771&lt;/td&gt;   &lt;td class="xl65"&gt; 17.44%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3361&lt;/td&gt;   &lt;td class="xl68"&gt;0.4121&lt;/td&gt;   &lt;td class="xl68"&gt;0.2669&lt;/td&gt;   &lt;td class="xl69"&gt;9.34&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Chuck&lt;/td&gt;   &lt;td class="xl67"&gt;Meriwether&lt;/td&gt;   &lt;td class="xl67"&gt;105&lt;/td&gt;   &lt;td class="xl67"&gt;8079&lt;/td&gt;   &lt;td class="xl65"&gt; 17.45%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3296&lt;/td&gt;   &lt;td class="xl68"&gt;0.4058&lt;/td&gt;   &lt;td class="xl68"&gt;0.2608&lt;/td&gt;   &lt;td class="xl69"&gt;9.31&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Ed&lt;/td&gt;   &lt;td class="xl67"&gt;Hickox&lt;/td&gt;   &lt;td class="xl67"&gt;105&lt;/td&gt;   &lt;td class="xl67"&gt;7955&lt;/td&gt;   &lt;td class="xl65"&gt; 17.88%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3243&lt;/td&gt;   &lt;td class="xl68"&gt;0.3943&lt;/td&gt;   &lt;td class="xl68"&gt;0.2513&lt;/td&gt;   &lt;td class="xl69"&gt;9.31&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Eric&lt;/td&gt;   &lt;td class="xl67"&gt;Cooper&lt;/td&gt;   &lt;td class="xl67"&gt;133&lt;/td&gt;   &lt;td class="xl67"&gt;10174&lt;/td&gt;   &lt;td class="xl65"&gt; 17.75%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3293&lt;/td&gt;   &lt;td class="xl68"&gt;0.4119&lt;/td&gt;   &lt;td class="xl68"&gt;0.2643&lt;/td&gt;   &lt;td class="xl69"&gt;9.31&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Tony&lt;/td&gt;   &lt;td class="xl67"&gt;Randazzo&lt;/td&gt;   &lt;td class="xl67"&gt;102&lt;/td&gt;   &lt;td class="xl67"&gt;7881&lt;/td&gt;   &lt;td class="xl65"&gt; 17.64%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3283&lt;/td&gt;   &lt;td class="xl68"&gt;0.4246&lt;/td&gt;   &lt;td class="xl68"&gt;0.2646&lt;/td&gt;   &lt;td class="xl69"&gt;9.29&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Marvin&lt;/td&gt;   &lt;td class="xl67"&gt;Hudson&lt;/td&gt;   &lt;td class="xl67"&gt;136&lt;/td&gt;   &lt;td class="xl67"&gt;10702&lt;/td&gt;   &lt;td class="xl65"&gt; 17.81%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3336&lt;/td&gt;   &lt;td class="xl68"&gt;0.4028&lt;/td&gt;   &lt;td class="xl68"&gt;0.2592&lt;/td&gt;   &lt;td class="xl69"&gt;9.29&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Charlie&lt;/td&gt;   &lt;td class="xl67"&gt;Reliford&lt;/td&gt;   &lt;td class="xl67"&gt;75&lt;/td&gt;   &lt;td class="xl67"&gt;5699&lt;/td&gt;   &lt;td class="xl65"&gt; 17.70%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3226&lt;/td&gt;   &lt;td class="xl68"&gt;0.3980&lt;/td&gt;   &lt;td class="xl68"&gt;0.2558&lt;/td&gt;   &lt;td class="xl69"&gt;9.24&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Wally&lt;/td&gt;   &lt;td class="xl67"&gt;Bell&lt;/td&gt;   &lt;td class="xl67"&gt;142&lt;/td&gt;   &lt;td class="xl67"&gt;10937&lt;/td&gt;   &lt;td class="xl65"&gt; 18.20%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3274&lt;/td&gt;   &lt;td class="xl68"&gt;0.4198&lt;/td&gt;   &lt;td class="xl68"&gt;0.2593&lt;/td&gt;   &lt;td class="xl69"&gt;9.24&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Lance&lt;/td&gt;   &lt;td class="xl67"&gt;Barksdale&lt;/td&gt;   &lt;td class="xl67"&gt;139&lt;/td&gt;   &lt;td class="xl67"&gt;10545&lt;/td&gt;   &lt;td class="xl65"&gt; 17.52%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3323&lt;/td&gt;   &lt;td class="xl68"&gt;0.4062&lt;/td&gt;   &lt;td class="xl68"&gt;0.2552&lt;/td&gt;   &lt;td class="xl69"&gt;9.24&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Greg&lt;/td&gt;   &lt;td class="xl67"&gt;Gibson&lt;/td&gt;   &lt;td class="xl67"&gt;135&lt;/td&gt;   &lt;td class="xl67"&gt;10583&lt;/td&gt;   &lt;td class="xl65"&gt; 17.00%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3311&lt;/td&gt;   &lt;td class="xl68"&gt;0.4046&lt;/td&gt;   &lt;td class="xl68"&gt;0.2568&lt;/td&gt;   &lt;td class="xl69"&gt;9.23&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;John&lt;/td&gt;   &lt;td class="xl67"&gt;Hirschbeck&lt;/td&gt;   &lt;td class="xl67"&gt;81&lt;/td&gt;   &lt;td class="xl67"&gt;6167&lt;/td&gt;   &lt;td class="xl65"&gt; 17.97%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3256&lt;/td&gt;   &lt;td class="xl68"&gt;0.4106&lt;/td&gt;   &lt;td class="xl68"&gt;0.2585&lt;/td&gt;   &lt;td class="xl69"&gt;9.21&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Dan&lt;/td&gt;   &lt;td class="xl67"&gt;Iassogna&lt;/td&gt;   &lt;td class="xl67"&gt;138&lt;/td&gt;   &lt;td class="xl67"&gt;10521&lt;/td&gt;   &lt;td class="xl65"&gt; 18.40%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3345&lt;/td&gt;   &lt;td class="xl68"&gt;0.4112&lt;/td&gt;   &lt;td class="xl68"&gt;0.2609&lt;/td&gt;   &lt;td class="xl69"&gt;9.20&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Todd&lt;/td&gt;   &lt;td class="xl67"&gt;Tichenor&lt;/td&gt;   &lt;td class="xl67"&gt;85&lt;/td&gt;   &lt;td class="xl67"&gt;6480&lt;/td&gt;   &lt;td class="xl65"&gt; 17.02%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3375&lt;/td&gt;   &lt;td class="xl68"&gt;0.4040&lt;/td&gt;   &lt;td class="xl68"&gt;0.2628&lt;/td&gt;   &lt;td class="xl69"&gt;9.19&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Derryl&lt;/td&gt;   &lt;td class="xl67"&gt;Cousins&lt;/td&gt;   &lt;td class="xl67"&gt;139&lt;/td&gt;   &lt;td class="xl67"&gt;10809&lt;/td&gt;   &lt;td class="xl65"&gt; 17.73%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3262&lt;/td&gt;   &lt;td class="xl68"&gt;0.3952&lt;/td&gt;   &lt;td class="xl68"&gt;0.2496&lt;/td&gt;   &lt;td class="xl69"&gt;9.18&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;James&lt;/td&gt;   &lt;td class="xl67"&gt;Hoye&lt;/td&gt;   &lt;td class="xl67"&gt;147&lt;/td&gt;   &lt;td class="xl67"&gt;11464&lt;/td&gt;   &lt;td class="xl65"&gt; 17.81%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3295&lt;/td&gt;   &lt;td class="xl68"&gt;0.4014&lt;/td&gt;   &lt;td class="xl68"&gt;0.2572&lt;/td&gt;   &lt;td class="xl69"&gt;9.15&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Joe&lt;/td&gt;   &lt;td class="xl67"&gt;West&lt;/td&gt;   &lt;td class="xl67"&gt;142&lt;/td&gt;   &lt;td class="xl67"&gt;11016&lt;/td&gt;   &lt;td class="xl65"&gt; 17.27%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3281&lt;/td&gt;   &lt;td class="xl68"&gt;0.4067&lt;/td&gt;   &lt;td class="xl68"&gt;0.2538&lt;/td&gt;   &lt;td class="xl69"&gt;9.14&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Jim&lt;/td&gt;   &lt;td class="xl67"&gt;Joyce&lt;/td&gt;   &lt;td class="xl67"&gt;131&lt;/td&gt;   &lt;td class="xl67"&gt;10070&lt;/td&gt;   &lt;td class="xl65"&gt; 16.74%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3341&lt;/td&gt;   &lt;td class="xl68"&gt;0.4036&lt;/td&gt;   &lt;td class="xl68"&gt;0.2599&lt;/td&gt;   &lt;td class="xl69"&gt;9.14&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Dale&lt;/td&gt;   &lt;td class="xl67"&gt;Scott&lt;/td&gt;   &lt;td class="xl67"&gt;142&lt;/td&gt;   &lt;td class="xl67"&gt;10816&lt;/td&gt;   &lt;td class="xl65"&gt; 18.14%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3325&lt;/td&gt;   &lt;td class="xl68"&gt;0.4143&lt;/td&gt;   &lt;td class="xl68"&gt;0.2623&lt;/td&gt;   &lt;td class="xl69"&gt;9.13&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Marty&lt;/td&gt;   &lt;td class="xl67"&gt;Foster&lt;/td&gt;   &lt;td class="xl67"&gt;121&lt;/td&gt;   &lt;td class="xl67"&gt;9343&lt;/td&gt;   &lt;td class="xl65"&gt; 18.41%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3285&lt;/td&gt;   &lt;td class="xl68"&gt;0.4101&lt;/td&gt;   &lt;td class="xl68"&gt;0.2584&lt;/td&gt;   &lt;td class="xl69"&gt;9.12&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Ted&lt;/td&gt;   &lt;td class="xl67"&gt;Barrett&lt;/td&gt;   &lt;td class="xl67"&gt;141&lt;/td&gt;   &lt;td class="xl67"&gt;10802&lt;/td&gt;   &lt;td class="xl65"&gt; 17.79%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3263&lt;/td&gt;   &lt;td class="xl68"&gt;0.4078&lt;/td&gt;   &lt;td class="xl68"&gt;0.2568&lt;/td&gt;   &lt;td class="xl69"&gt;9.11&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Mike&lt;/td&gt;   &lt;td class="xl67"&gt;Everitt&lt;/td&gt;   &lt;td class="xl67"&gt;143&lt;/td&gt;   &lt;td class="xl67"&gt;11021&lt;/td&gt;   &lt;td class="xl65"&gt; 18.10%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3279&lt;/td&gt;   &lt;td class="xl68"&gt;0.4114&lt;/td&gt;   &lt;td class="xl68"&gt;0.2569&lt;/td&gt;   &lt;td class="xl69"&gt;9.09&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Kerwin&lt;/td&gt;   &lt;td class="xl67"&gt;Danley&lt;/td&gt;   &lt;td class="xl67"&gt;109&lt;/td&gt;   &lt;td class="xl67"&gt;8248&lt;/td&gt;   &lt;td class="xl65"&gt; 17.34%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3359&lt;/td&gt;   &lt;td class="xl68"&gt;0.4069&lt;/td&gt;   &lt;td class="xl68"&gt;0.2633&lt;/td&gt;   &lt;td class="xl69"&gt;9.08&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Fieldin&lt;/td&gt;   &lt;td class="xl67"&gt;Culbreth&lt;/td&gt;   &lt;td class="xl67"&gt;142&lt;/td&gt;   &lt;td class="xl67"&gt;10848&lt;/td&gt;   &lt;td class="xl65"&gt; 17.16%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3311&lt;/td&gt;   &lt;td class="xl68"&gt;0.4175&lt;/td&gt;   &lt;td class="xl68"&gt;0.2603&lt;/td&gt;   &lt;td class="xl69"&gt;9.01&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Tom&lt;/td&gt;   &lt;td class="xl67"&gt;Hallion&lt;/td&gt;   &lt;td class="xl67"&gt;138&lt;/td&gt;   &lt;td class="xl67"&gt;10428&lt;/td&gt;   &lt;td class="xl65"&gt; 18.37%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3251&lt;/td&gt;   &lt;td class="xl68"&gt;0.4121&lt;/td&gt;   &lt;td class="xl68"&gt;0.2561&lt;/td&gt;   &lt;td class="xl69"&gt;9.01&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Brian&lt;/td&gt;   &lt;td class="xl67"&gt;Runge&lt;/td&gt;   &lt;td class="xl67"&gt;120&lt;/td&gt;   &lt;td class="xl67"&gt;9048&lt;/td&gt;   &lt;td class="xl65"&gt; 18.39%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3238&lt;/td&gt;   &lt;td class="xl68"&gt;0.4149&lt;/td&gt;   &lt;td class="xl68"&gt;0.2590&lt;/td&gt;   &lt;td class="xl69"&gt;8.99&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Laz&lt;/td&gt;   &lt;td class="xl67"&gt;Diaz&lt;/td&gt;   &lt;td class="xl67"&gt;139&lt;/td&gt;   &lt;td class="xl67"&gt;10683&lt;/td&gt;   &lt;td class="xl65"&gt; 18.41%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3234&lt;/td&gt;   &lt;td class="xl68"&gt;0.4069&lt;/td&gt;   &lt;td class="xl68"&gt;0.2560&lt;/td&gt;   &lt;td class="xl69"&gt;8.99&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Bruce&lt;/td&gt;   &lt;td class="xl67"&gt;Dreckman&lt;/td&gt;   &lt;td class="xl67"&gt;123&lt;/td&gt;   &lt;td class="xl67"&gt;9573&lt;/td&gt;   &lt;td class="xl65"&gt; 17.05%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3290&lt;/td&gt;   &lt;td class="xl68"&gt;0.4013&lt;/td&gt;   &lt;td class="xl68"&gt;0.2579&lt;/td&gt;   &lt;td class="xl69"&gt;8.98&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Paul&lt;/td&gt;   &lt;td class="xl67"&gt;Nauert&lt;/td&gt;   &lt;td class="xl67"&gt;137&lt;/td&gt;   &lt;td class="xl67"&gt;10471&lt;/td&gt;   &lt;td class="xl65"&gt; 17.85%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3262&lt;/td&gt;   &lt;td class="xl68"&gt;0.4146&lt;/td&gt;   &lt;td class="xl68"&gt;0.2602&lt;/td&gt;   &lt;td class="xl69"&gt;8.98&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Gary&lt;/td&gt;   &lt;td class="xl67"&gt;Darling&lt;/td&gt;   &lt;td class="xl67"&gt;131&lt;/td&gt;   &lt;td class="xl67"&gt;9874&lt;/td&gt;   &lt;td class="xl65"&gt; 18.14%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3289&lt;/td&gt;   &lt;td class="xl68"&gt;0.4100&lt;/td&gt;   &lt;td class="xl68"&gt;0.2621&lt;/td&gt;   &lt;td class="xl69"&gt;8.96&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Mike&lt;/td&gt;   &lt;td class="xl67"&gt;DiMuro&lt;/td&gt;   &lt;td class="xl67"&gt;109&lt;/td&gt;   &lt;td class="xl67"&gt;8386&lt;/td&gt;   &lt;td class="xl65"&gt; 18.28%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3219&lt;/td&gt;   &lt;td class="xl68"&gt;0.3997&lt;/td&gt;   &lt;td class="xl68"&gt;0.2515&lt;/td&gt;   &lt;td class="xl69"&gt;8.95&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Mark&lt;/td&gt;   &lt;td class="xl67"&gt;Wegner&lt;/td&gt;   &lt;td class="xl67"&gt;133&lt;/td&gt;   &lt;td class="xl67"&gt;10173&lt;/td&gt;   &lt;td class="xl65"&gt; 18.34%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3279&lt;/td&gt;   &lt;td class="xl68"&gt;0.3991&lt;/td&gt;   &lt;td class="xl68"&gt;0.2518&lt;/td&gt;   &lt;td class="xl69"&gt;8.94&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Phil&lt;/td&gt;   &lt;td class="xl67"&gt;Cuzzi&lt;/td&gt;   &lt;td class="xl67"&gt;138&lt;/td&gt;   &lt;td class="xl67"&gt;10492&lt;/td&gt;   &lt;td class="xl65"&gt; 18.76%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3252&lt;/td&gt;   &lt;td class="xl68"&gt;0.4067&lt;/td&gt;   &lt;td class="xl68"&gt;0.2582&lt;/td&gt;   &lt;td class="xl69"&gt;8.93&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Angel&lt;/td&gt;   &lt;td class="xl67"&gt;Hernandez&lt;/td&gt;   &lt;td class="xl67"&gt;141&lt;/td&gt;   &lt;td class="xl67"&gt;10650&lt;/td&gt;   &lt;td class="xl65"&gt; 17.29%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3279&lt;/td&gt;   &lt;td class="xl68"&gt;0.3962&lt;/td&gt;   &lt;td class="xl68"&gt;0.2557&lt;/td&gt;   &lt;td class="xl69"&gt;8.90&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Ed&lt;/td&gt;   &lt;td class="xl67"&gt;Rapuano&lt;/td&gt;   &lt;td class="xl67"&gt;140&lt;/td&gt;   &lt;td class="xl67"&gt;10689&lt;/td&gt;   &lt;td class="xl65"&gt; 17.55%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3293&lt;/td&gt;   &lt;td class="xl68"&gt;0.4072&lt;/td&gt;   &lt;td class="xl68"&gt;0.2579&lt;/td&gt;   &lt;td class="xl69"&gt;8.89&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Bob&lt;/td&gt;   &lt;td class="xl67"&gt;Davidson&lt;/td&gt;   &lt;td class="xl67"&gt;140&lt;/td&gt;   &lt;td class="xl67"&gt;10803&lt;/td&gt;   &lt;td class="xl65"&gt; 17.40%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3307&lt;/td&gt;   &lt;td class="xl68"&gt;0.3924&lt;/td&gt;   &lt;td class="xl68"&gt;0.2576&lt;/td&gt;   &lt;td class="xl69"&gt;8.86&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Mike&lt;/td&gt;   &lt;td class="xl67"&gt;Winters&lt;/td&gt;   &lt;td class="xl67"&gt;133&lt;/td&gt;   &lt;td class="xl67"&gt;9904&lt;/td&gt;   &lt;td class="xl65"&gt; 18.35%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3302&lt;/td&gt;   &lt;td class="xl68"&gt;0.4070&lt;/td&gt;   &lt;td class="xl68"&gt;0.2620&lt;/td&gt;   &lt;td class="xl69"&gt;8.86&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Rob&lt;/td&gt;   &lt;td class="xl67"&gt;Drake&lt;/td&gt;   &lt;td class="xl67"&gt;146&lt;/td&gt;   &lt;td class="xl67"&gt;11091&lt;/td&gt;   &lt;td class="xl65"&gt; 18.86%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3231&lt;/td&gt;   &lt;td class="xl68"&gt;0.4019&lt;/td&gt;   &lt;td class="xl68"&gt;0.2515&lt;/td&gt;   &lt;td class="xl69"&gt;8.85&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Jim&lt;/td&gt;   &lt;td class="xl67"&gt;Wolf&lt;/td&gt;   &lt;td class="xl67"&gt;133&lt;/td&gt;   &lt;td class="xl67"&gt;10133&lt;/td&gt;   &lt;td class="xl65"&gt; 18.01%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3313&lt;/td&gt;   &lt;td class="xl68"&gt;0.4078&lt;/td&gt;   &lt;td class="xl68"&gt;0.2604&lt;/td&gt;   &lt;td class="xl69"&gt;8.83&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Hunter&lt;/td&gt;   &lt;td class="xl67"&gt;Wendelstedt&lt;/td&gt;   &lt;td class="xl67"&gt;140&lt;/td&gt;   &lt;td class="xl67"&gt;10625&lt;/td&gt;   &lt;td class="xl65"&gt; 17.37%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3258&lt;/td&gt;   &lt;td class="xl68"&gt;0.4021&lt;/td&gt;   &lt;td class="xl68"&gt;0.2558&lt;/td&gt;   &lt;td class="xl69"&gt;8.81&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Bill&lt;/td&gt;   &lt;td class="xl67"&gt;Miller&lt;/td&gt;   &lt;td class="xl67"&gt;142&lt;/td&gt;   &lt;td class="xl67"&gt;10852&lt;/td&gt;   &lt;td class="xl65"&gt; 18.69%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3186&lt;/td&gt;   &lt;td class="xl68"&gt;0.4026&lt;/td&gt;   &lt;td class="xl68"&gt;0.2534&lt;/td&gt;   &lt;td class="xl69"&gt;8.77&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Brian&lt;/td&gt;   &lt;td class="xl67"&gt;O'Nora&lt;/td&gt;   &lt;td class="xl67"&gt;124&lt;/td&gt;   &lt;td class="xl67"&gt;9305&lt;/td&gt;   &lt;td class="xl65"&gt; 17.69%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3221&lt;/td&gt;   &lt;td class="xl68"&gt;0.4100&lt;/td&gt;   &lt;td class="xl68"&gt;0.2571&lt;/td&gt;   &lt;td class="xl69"&gt;8.77&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Ron&lt;/td&gt;   &lt;td class="xl67"&gt;Kulpa&lt;/td&gt;   &lt;td class="xl67"&gt;130&lt;/td&gt;   &lt;td class="xl67"&gt;10016&lt;/td&gt;   &lt;td class="xl65"&gt; 18.24%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3286&lt;/td&gt;   &lt;td class="xl68"&gt;0.4033&lt;/td&gt;   &lt;td class="xl68"&gt;0.2578&lt;/td&gt;   &lt;td class="xl69"&gt;8.76&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Jerry&lt;/td&gt;   &lt;td class="xl67"&gt;Layne&lt;/td&gt;   &lt;td class="xl67"&gt;118&lt;/td&gt;   &lt;td class="xl67"&gt;9071&lt;/td&gt;   &lt;td class="xl65"&gt; 17.43%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3313&lt;/td&gt;   &lt;td class="xl68"&gt;0.4008&lt;/td&gt;   &lt;td class="xl68"&gt;0.2525&lt;/td&gt;   &lt;td class="xl69"&gt;8.71&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Mark&lt;/td&gt;   &lt;td class="xl67"&gt;Carlson&lt;/td&gt;   &lt;td class="xl67"&gt;107&lt;/td&gt;   &lt;td class="xl67"&gt;7971&lt;/td&gt;   &lt;td class="xl65"&gt; 18.15%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3266&lt;/td&gt;   &lt;td class="xl68"&gt;0.4053&lt;/td&gt;   &lt;td class="xl68"&gt;0.2565&lt;/td&gt;   &lt;td class="xl69"&gt;8.67&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Jeff&lt;/td&gt;   &lt;td class="xl67"&gt;Kellogg&lt;/td&gt;   &lt;td class="xl67"&gt;143&lt;/td&gt;   &lt;td class="xl67"&gt;10784&lt;/td&gt;   &lt;td class="xl65"&gt; 17.23%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3291&lt;/td&gt;   &lt;td class="xl68"&gt;0.4101&lt;/td&gt;   &lt;td class="xl68"&gt;0.2563&lt;/td&gt;   &lt;td class="xl69"&gt;8.66&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Paul&lt;/td&gt;   &lt;td class="xl67"&gt;Emmel&lt;/td&gt;   &lt;td class="xl67"&gt;134&lt;/td&gt;   &lt;td class="xl67"&gt;10107&lt;/td&gt;   &lt;td class="xl65"&gt; 18.77%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3195&lt;/td&gt;   &lt;td class="xl68"&gt;0.3924&lt;/td&gt;   &lt;td class="xl68"&gt;0.2537&lt;/td&gt;   &lt;td class="xl69"&gt;8.65&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Chris&lt;/td&gt;   &lt;td class="xl67"&gt;Guccione&lt;/td&gt;   &lt;td class="xl67"&gt;148&lt;/td&gt;   &lt;td class="xl67"&gt;11205&lt;/td&gt;   &lt;td class="xl65"&gt; 17.72%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3303&lt;/td&gt;   &lt;td class="xl68"&gt;0.3999&lt;/td&gt;   &lt;td class="xl68"&gt;0.2578&lt;/td&gt;   &lt;td class="xl69"&gt;8.64&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Jeff&lt;/td&gt;   &lt;td class="xl67"&gt;Nelson&lt;/td&gt;   &lt;td class="xl67"&gt;123&lt;/td&gt;   &lt;td class="xl67"&gt;9399&lt;/td&gt;   &lt;td class="xl65"&gt; 18.24%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3248&lt;/td&gt;   &lt;td class="xl68"&gt;0.3997&lt;/td&gt;   &lt;td class="xl68"&gt;0.2523&lt;/td&gt;   &lt;td class="xl69"&gt;8.63&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Gary&lt;/td&gt;   &lt;td class="xl67"&gt;Cederstrom&lt;/td&gt;   &lt;td class="xl67"&gt;138&lt;/td&gt;   &lt;td class="xl67"&gt;10387&lt;/td&gt;   &lt;td class="xl65"&gt; 18.07%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3292&lt;/td&gt;   &lt;td class="xl68"&gt;0.4031&lt;/td&gt;   &lt;td class="xl68"&gt;0.2583&lt;/td&gt;   &lt;td class="xl69"&gt;8.62&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Doug&lt;/td&gt;   &lt;td class="xl67"&gt;Eddings&lt;/td&gt;   &lt;td class="xl67"&gt;140&lt;/td&gt;   &lt;td class="xl67"&gt;10530&lt;/td&gt;   &lt;td class="xl65"&gt; 18.64%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3237&lt;/td&gt;   &lt;td class="xl68"&gt;0.4112&lt;/td&gt;   &lt;td class="xl68"&gt;0.2596&lt;/td&gt;   &lt;td class="xl69"&gt;8.56&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Andy&lt;/td&gt;   &lt;td class="xl67"&gt;Fletcher&lt;/td&gt;   &lt;td class="xl67"&gt;117&lt;/td&gt;   &lt;td class="xl67"&gt;8930&lt;/td&gt;   &lt;td class="xl65"&gt; 18.91%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3221&lt;/td&gt;   &lt;td class="xl68"&gt;0.3852&lt;/td&gt;   &lt;td class="xl68"&gt;0.2491&lt;/td&gt;   &lt;td class="xl69"&gt;8.20&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Mike&lt;/td&gt;   &lt;td class="xl67"&gt;Estabrook&lt;/td&gt;   &lt;td class="xl67"&gt;83&lt;/td&gt;   &lt;td class="xl67"&gt;6265&lt;/td&gt;   &lt;td class="xl65"&gt; 18.13%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3200&lt;/td&gt;   &lt;td class="xl68"&gt;0.3848&lt;/td&gt;   &lt;td class="xl68"&gt;0.2559&lt;/td&gt;   &lt;td class="xl69"&gt;7.95&lt;/td&gt;  &lt;/tr&gt;  &lt;tr style="height: 15pt;" height="20"&gt;   &lt;td class="xl67" style="height: 15pt;" height="20"&gt;Bill&lt;/td&gt;   &lt;td class="xl67"&gt;Hohn&lt;/td&gt;   &lt;td class="xl67"&gt;91&lt;/td&gt;   &lt;td class="xl67"&gt;6618&lt;/td&gt;   &lt;td class="xl65"&gt; 16.88%&lt;/td&gt;   &lt;td class="xl68"&gt;0.3234&lt;/td&gt;   &lt;td class="xl68"&gt;0.3965&lt;/td&gt;   &lt;td class="xl68"&gt;0.2505&lt;/td&gt;   &lt;td class="xl69"&gt;7.91&lt;/td&gt;  &lt;/tr&gt; &lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;&lt;br /&gt;Anyway, Jeff's post was more about strike calling percentage than anything else.  His tables seem strange, and if they're telling me what I think they're telling me, then I don't think they're correctly.  For example, of all pitches called strikes by the umpire in 2010, I have about 65% of those falling within the RULEBOOK strike zone (that means the edges of the plate, NOT the 2-foot wide zone commonly used for the zone).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;PRELIMINARY DATA HAS BEEN REMOVED BECAUSE I'VE SEEN IT ABUSED IN CERTAIN PLACES.  PLEASE SEE LATEST VERSION OF DATABASE!&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Below, I show a table of a number of things.  The first 3 columns show the percentage of pitches within the rulebook strike zone CORRECTLY called a strike.  Similarly, the next 3 columns show the percentage that each umpire CORRECTLY calls a ball when it is truly outside the strike zone.  I do this for all batters, RHB, and then LHB.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Next, I also tally up the INCORRECT ball and strike calls.  So these are the percentages that each umpire calls a Strike on a pitch that is actually OUTSIDE the rulebook zone OR calls a Ball on a pitch that is truly WITHIN the rulebook zone.  Again, keep in mind I use the rulebook zone, rather than the standard 2-foot wide zone:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;PRELIMINARY DATA HAS BEEN REMOVED BECAUSE I'VE SEEN IT ABUSED IN CERTAIN PLACES.  PLEASE SEE LATEST VERSION OF DATABASE!&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I was in the process of also recording the total number of pitches called by each umpire to put it in perspective, but did not have time before posting this.  I'll add that stuff later on.  I think it's pretty obvious that Barrett doesn't have a perfect call percentage with LHB up to bat.&lt;br /&gt;&lt;br /&gt;Anyway, I'll have more on this later.  For now, look at the zones below from 2010 for all of the umpires in video format (yeah, yeah, I re-posted it but it sure makes sense to have it in this post as well).&lt;br /&gt;&lt;br /&gt;NOTE: I fixed the videos.  I was made aware that no one could see them because of Facebook privacy settings.  Please let me know if there is still a problem.  DUH!&lt;br /&gt;&lt;br /&gt;Another Update: I added pitch counts for 2010 to the data tables above as to keep from making big conclusions with small sample sizes.  When comparing RHB to LHB, remember that it's pretty common to have the LHB zone shifted outside.  Because I have used the BOOK ZONE to gauge 'correctness' of the call, these will be skewed a bit.  Also, I am working on getting the tables a bit more manageable for Blogger, which continues to disappoint me with its formatting capabilities.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;object width="400" height="300"&gt;&lt;param name="allowfullscreen" value="true"&gt;&lt;param name="movie" value="http://www.facebook.com/v/544688443983"&gt;&lt;embed src="http://www.facebook.com/v/544688443983" type="application/x-shockwave-flash" allowfullscreen="true" width="400" height="300"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;object width="400" height="300"&gt;&lt;param name="allowfullscreen" value="true"&gt;&lt;param name="movie" value="http://www.facebook.com/v/544688134603"&gt;&lt;embed src="http://www.facebook.com/v/544688134603" type="application/x-shockwave-flash" allowfullscreen="true" width="400" height="300"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6731958142156026312-6764632586375702959?l=princeofslides.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://princeofslides.blogspot.com/feeds/6764632586375702959/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://princeofslides.blogspot.com/2011/03/umpire-strike-zones.html#comment-form' title='19 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/6764632586375702959'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6731958142156026312/posts/default/6764632586375702959'/><link rel='alternate' type='text/html' href='http://princeofslides.blogspot.com/2011/03/umpire-strike-zones.html' title='Umpire Strike Zones'/><author><name>Millsy</name><uri>http://www.blogger.com/profile/05121540047611227512</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='17' height='32' src='http://2.bp.blogspot.com/_CERlGVs2E6w/SrEMursY__I/AAAAAAAAAB4/DjEzsrBENso/S220/n115201179_30854290_325.jpg'/></author><thr:total>19</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6731958142156026312.post-4281256310462112097</id><published>2011-03-22T23:55:00.001-04:00</published><updated>2011-03-22T11:58:55.245-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Visualizations'/><category scheme='http://www.blogger.com/atom/ns#' term='R-project'/><category scheme='http://www.blogger.com/atom/ns#' term='Statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='Baseball'/><category scheme='http://www.blogger.com/atom/ns#' term='sab-R-metrics'/><title type='text'>sab-R-metrics Sidetrack: Bubble Plots</title><content type='html'>While I had mentioned in my last post that I will cover logistic regression in my next post, I decided that a quick interlude in working with bubble plots would be fun.  &lt;a href="http://www.google.com/imgres?imgurl=http://flowingdata.com/wp-content/uploads/2010/11/5-edited-version1-575x385.png&amp;amp;imgrefurl=http://flowingdata.com/2010/11/23/how-to-make-bubble-charts/&amp;amp;usg=__7A7ZU0kc-9Log1e-pVJDvgcoWAk=&amp;amp;h=385&amp;amp;w=575&amp;amp;sz=96&amp;amp;hl=en&amp;amp;start=40&amp;amp;zoom=1&amp;amp;tbnid=cbac0REclChpUM:&amp;amp;tbnh=162&amp;amp;tbnw=242&amp;amp;ei=02J-TbXYK47AgQfVuqmAAg&amp;amp;prev=/images%3Fq%3Dbubble%2Bplots%26um%3D1%26hl%3Den%26sa%3DX%26biw%3D1280%26bih%3D843%26tbs%3Disch:10%2C1140&amp;amp;um=1&amp;amp;itbs=1&amp;amp;iact=rc&amp;amp;dur=350&amp;amp;oei=zGJ-Taz_GpOL0QGd8ISGBA&amp;amp;page=3&amp;amp;ndsp=20&amp;amp;ved=1t:429,r:3,s:40&amp;amp;tx=81&amp;amp;ty=106&amp;amp;biw=1280&amp;amp;bih=843"&gt;Bubble plots&lt;/a&gt; have &lt;a href="http://www.baycityball.com/2011/01/27/graphs-war-bubbles-for-1st-round-pitchers/"&gt;become pretty popular&lt;/a&gt; recently, especially with all of the &lt;a href="http://datavizchallenge.org/"&gt;Visualization Challenges&lt;/a&gt; I've seen around the internet (by the way, I think people in the sabermetric world have a great chance to win some of these, despite the fact that they're generally not baseball data).Ultimately , bubble plots are a good way to present a third dimension on a graph.&lt;br /&gt;&lt;br /&gt;Today, I'll talk about doing some &lt;span style="font-weight: bold;"&gt;basic &lt;/span&gt;bubble plots using some Red Sox and Yankees data on attendance and wins over time (&lt;a href="http://sitemaker.umich.edu/millsbrian/sab-r-metrics_info"&gt;click here for the "soxyanks.csv" data link&lt;/a&gt;).  If you remember my quick &lt;a href="http://princeofslides.blogspot.com/2011/02/sab-r-metrics-displaying-time-series.html"&gt;tutorial on plotting time series data&lt;/a&gt;, I showed how to track wins and attendance over time.  However, we often want to include the most information possible on our plots, and that often means presenting a third (or fourth) variable.  This makes the 2-dimensional world of plotting more challenging, and that is where bubbles come in (Side Note: It is also why heat maps are so extensively used for Pitch F/X data!).&lt;br /&gt;&lt;br /&gt;Okay, so what do the bubbles tell us?  Generally, the size of the bubble is meant to represent that third dimension.  For wins and attendance over time, it's not straight forward to track these on the same plot.  You could normalize them so that they're on the same scale and then plot them together, but this is a difficult comparison over time when something like attendance is growing.  Of course, this is a common time series issue that I'm not going to get into on this site in which you could take a first difference approach or some other more complicated model, do some smoothing, go into the frequency domain, and so on.  But you don't want to hear about unit roots, random walks, and the like.  You're here for baseball and fun...right?.  If you normalize the two variables--just using standard z-scores--you'll end up with something like this:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://1.bp.blogspot.com/-DZinagvgBtA/TX5plkaqz6I/AAAAAAAAAP4/pKzpcHCyON4/s1600/boringplot.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 246px;" src="http://1.bp.blogspot.com/-DZinagvgBtA/TX5plkaqz6I/AAAAAAAAAP4/pKzpcHCyON4/s320/boringplot.png" alt="" id="BLOGGER_PHOTO_ID_5584016682137604002" border="0" /&gt;&lt;/a&gt;Bleh. Assuming we think the above plot is useful and want to compare two teams, we probably have to make side-by-side plots.  It's easier to compare sometimes when things are on the same plot.  So we can represent something like winning using bubble size at each year, with attendance on the y-axis.  Let's load in the data and start thinking about our variables and just plot Yankees and Red Sox attendance on the same time plot at first:&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 51, 255);"&gt;##set working directory and load data&lt;/span&gt; &lt;span style="color: rgb(51, 51, 255);"&gt;&lt;br /&gt;&lt;br /&gt;setwd("c:/Users/Millsy/Dropbox/Blog Stuff/sab-R-metrics")&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 51, 255);"&gt;##load data&lt;/span&gt; &lt;span style="color: rgb(51, 51, 255);"&gt;&lt;br /&gt;&lt;br /&gt;ball &lt;- read.csv(file="soxyanks.csv", h=T)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 51, 255);"&gt;head&lt;/span&gt;&lt;span style="color: rgb(51, 51, 255);"&gt;(ball)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 51, 255);"&gt;##attendance time plot&lt;/span&gt; &lt;span style="color: rgb(51, 51, 255);"&gt;&lt;br /&gt;&lt;br /&gt;plot(ball$yank.att ~ ball$year, xlab="Year", ylab="Yankees vs. Red Sox Attendance",&lt;/span&gt;&lt;span style="color: rgb(51, 51, 255);"&gt;main="Average Attendance Per Game", col="darkblue", type="l", lwd=3)&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 51, 255);"&gt;&lt;br /&gt;lines(ball$bos.att ~ ball$year, col="darkred", lwd=3)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 51, 255);"&gt;legend(1900, 54000, legend=c("Yankee", "Red Sox"), fill=c("darkblue", "darkred"))&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://3.bp.blogspot.com/-3rQQr7fpcws/TX5wWMMCTrI/AAAAAAAAAQY/7VM4Zixi8pg/s1600/simpletime.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 246px;" src="http://3.bp.blogspot.com/-3rQQr7fpcws/TX5wWMMCTrI/AAAAAAAAAQY/7VM4Zixi8pg/s320/simpletime.png" alt="" id="BLOGGER_PHOTO_ID_5584024114517135026" border="0" /&gt;&lt;/a&gt;&lt;a href="http://2.bp.blogspot.com/-spIgTlUEsYo/TX5v--FmWKI/AAAAAAAAAQA/qwrkxlLFMig/s1600/simpletime.png"&gt;&lt;br /&gt;&lt;/a&gt;Pretty easy to see the general trend in attendance over time, with the usual spikes.  However, this doesn't give us much information about the wins of each team over time.  We could make a separate plot to compare wins over time for each team.  Or, we can represent this new dimension using bubbles at each time point, where the size of the bubble represents the winning percentage of each team in each year.&lt;br /&gt;&lt;br /&gt;There are a number of ways to do this in R, and I'll begin with a simple one: simply using the command "&lt;span style="color: rgb(153, 0, 0);"&gt;cex=&lt;/span&gt;" to indicate point size based on some variable.  There are some shortcomings with this method, but I'll talk about that later.  Beginning with just the Yankees, let's plot some points in addition to our lines (keep in mind this is a starter point--this plot will be ugly):&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 51, 255);"&gt;##plot yankees attendance and wins using "cex=&lt;/span&gt;&lt;span style="color: rgb(51, 51, 255);"&gt;"  &lt;/span&gt;&lt;span style="color: rgb(51, 51, 255);"&gt;&lt;br /&gt;&lt;br /&gt;plot(yank.att ~ year, data=ball, pch=16, cex=20*yank.win^3, col="darkgrey", main="Yankees Wins &amp;amp; Attendance Over Time", xlab="Year", ylab="Average Game Attendance")&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 51, 255);"&gt;lines(yank.att ~ year, data=ball, lwd=2, col="darkblue")&lt;/span&gt;  &lt;span style="color: rgb(51, 51, 255);"&gt;&lt;br /&gt;&lt;br /&gt;legend(1900, 54000, legend=c(".250 W%", ".350 W%", ".450 W%", ".550 W%", ".650 W%", ".750 W%"), col="darkgrey", pch=16, pt.cex=c(20*.25^3, 20*.35^3, 20*.45^3, 20*.55^3, 20*.65^3, 20*.75^3), cex=c(.6,.7,.8,1,1.25,1.5), bty="n")&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://1.bp.blogspot.com/-lEwk2qGEml0/TX5wvdgPw-I/AAAAAAAAAQo/t4eNx-2SBZk/s1600/yanksonly.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 208px;" src="http://1.bp.blogspot.com/-lEwk2qGEml0/TX5wvdgPw-I/AAAAAAAAAQo/t4eNx-2SBZk/s320/yanksonly.png" alt="" id="BLOGGER_PHOTO_ID_5584024548662035426" border="0" /&gt;&lt;/a&gt;&lt;a href="http://2.bp.blogspot.com/-uODBFDXsWJE/TX5wEdycsJI/AAAAAAAAAQI/Gyz_6JbhOAk/s1600/yankees.png"&gt;&lt;br /&gt;&lt;/a&gt;&lt;br /&gt;The legend in the above plot is a bit complicated and is unfortunately the best I can do with this code.  Later in this post, I'll show another way to do these &lt;a href="http://flowingdata.com/2010/11/23/how-to-make-bubble-charts/"&gt;based on some code in this tutorial&lt;/a&gt;.  Honestly, I think my legend is a bit ugly and I'm pretty sure that the "&lt;span style="color: rgb(153, 0, 0);"&gt;ggplot2&lt;/span&gt;" package has a better way.  Also notice that I use a polynomial to scale the bubbles.  Normally, I wouldn't recommend doing this; however, because of the small range of win percents, this tends to give more useful size ranges for plotting.  If you want to do a simple linear transformation, you can multiply the win percents by a constant instead...or use wins (which is problematic since teams have not played the same number of games for the entire time period).  The reason this can become a problem is that we want the bubbles to have proportional area based on the win percent.  I'll talk about this in a few paragraphs below, but will first talk about some color issues.&lt;br /&gt;&lt;br /&gt;Unfortunately, the bubbles all mesh together in the plot.  And I'll use this portion of the tutorial as a lesson in the RGB color scale in R, along with how to work with transparent colors.  The RGB scale stands for Red-Green-Blue.  It's just like that guy with the insanely deep voice talking about the new Sharp televisions (except they add yellow).  So, while we can use the names of colors (and just general numbers for colors), we can also use the RGB scale to make our own colors.&lt;br /&gt;&lt;br /&gt;I'll just start with a simple way to work with the color scale.  When using this scale, you will need to input an 8-digit number in the form of:&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 51, 255);"&gt;col="#00000000"&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The first two digits will tell how much Red to put into the color (on a 00 to 99 scale).  The second two digits do the same for Green, and the third pair of digits do this for Blue.  Finally, the last pair of numbers will tell R how transparent you want your color to be.  For lots of transparency, you set this number low.  For less transparency, you set it high.  We can use this to our advantage in the bubble plots so that we can see the outline of each bubble if they overlap.  So let's rework the Yankee plot above, but make some transparent colors:&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 51, 255);"&gt;##now do Yankee plot with transparent colors&lt;/span&gt;  &lt;span style="color: rgb(51, 51, 255);"&gt;&lt;br /&gt;&lt;br /&gt;plot(yank.att ~ year, data=ball, pch=16, cex=20*yank.win^3, col="#99999950", main="Yankees Wins &amp;amp; Attendance Over Time", xlab="Year", ylab="Average Game Attendance")&lt;/span&gt;  &lt;span style="color: rgb(51, 51, 255);"&gt;&lt;br /&gt;&lt;br /&gt;lines(yank.att ~ year, data=ball, lwd=2, col="darkblue")&lt;/span&gt;  &lt;span style="color: rgb(51, 51, 255);"&gt;&lt;br /&gt;&lt;br /&gt;legend(1900, 54000, legend=c(".250 W%", ".350 W%", ".450 W%", ".550 W%", ".650 W%", ".750 W%"), col="#99999950", pch=16, pt.cex=c(20*.25^3, 20*.35^3, 20*.45^3, 20*.55^3, 20*.65^3, 20*.75^3), cex=c(.6,.7,.8,1,1.25,1.5), bty="n")&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://3.bp.blogspot.com/-NJdvj5hDVVg/TX5wvSXBBBI/AAAAAAAAAQw/siLaW0jxnAA/s1600/yankstransparent.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 208px;" src="http://3.bp.blogspot.com/-NJdvj5hDVVg/TX5wvSXBBBI/AAAAAAAAAQw/siLaW0jxnAA/s320/yankstransparent.png" alt="" id="BLOGGER_PHOTO_ID_5584024545670530066" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;This looks a little better, as you can see the outline of each bubble in the overlapping portions with other bubbles.  You can see that the Yankees had a rough decade in the 1970's in both attendance and winning.  Their attendance seemed to drop below what a normal trend would suggest in these years, and there seems to be a good chance that this was due to their sub-par on-field performance (remember, we're just speculating here).  By this, you can see some advantage to including bubbles for this type of data.&lt;br /&gt;&lt;br /&gt;Now, let's go ahead and add the Red Sox data to this plot.  I altered the key just a little bit, but still not to my liking:&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 51, 255);"&gt;##have both Yankees and Red Sox on same plot&lt;/span&gt;  &lt;span style="color: rgb(51, 51, 255);"&gt;&lt;br /&gt;&lt;br /&gt;plot(yank.att ~ year, data=ball, pch=16, cex=20*yank.win^3, col="#99999950", main="Yankees vs. Red Sox Wins and Attendance Over Time", xlab="Year", ylab="Average Game Attendance")&lt;/span&gt;  &lt;span style="color: rgb(51, 51, 255);"&gt;&lt;br /&gt;&lt;br /&gt;points(bos.att ~ year, data=ball, pch=16, cex=20*bos.win^3, col="#99000050")&lt;/span&gt;  &lt;span style="color: rgb(51, 51, 255);"&gt;&lt;br /&gt;&lt;br /&gt;lines(yank.att ~ year, lwd=2, col="darkblue", data=ball)&lt;/span&gt;  &lt;span style="color: rgb(51, 51, 255);"&gt;&lt;br /&gt;&lt;br /&gt;lines(bos.att ~ year, lwd=2, col="darkred", data=ball)&lt;/span&gt;  &lt;span style="color: rgb(51, 51, 255);"&gt;&lt;br /&gt;&lt;br /&gt;legend(1900, 54000, legend=c(".250 W%", ".350 W%", ".450 W%", ".550 W%", ".650 W%", ".750 W%"), col="#99999950", pch=16, pt.cex=c(20*.25^3, 20*.35^3, 20*.45^3, 20*.55^3, 20*.65^3, 20*.75^3), cex=2, bty="n")&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://3.bp.blogspot.com/-BVTlodZvHdc/TX5xh2GbikI/AAAAAAAAAQ4/YZa8QuIHZgQ/s1600/YanksSoxBubble.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 208px;" src="http://3.bp.blogspot.com/-BVTlodZvHdc/TX5xh2GbikI/AAAAAAAAAQ4/YZa8QuIHZgQ/s320/YanksSoxBubble.png" alt="" id="BLOGGER_PHOTO_ID_5584025414258100802" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Here we can see the demise of the Red Sox in the 1920's, as their performance was so bad we can barely see their win bubbles.  Red Sox attendance was low at those points, and we see this happen again not long after the WWII attendance bump.  Then, when the Yankees start sucking in the 70's, we see the Red Sox attendance rebound a bit as the team improves a little.  See how the bubbles help us to tell a story over time.&lt;br /&gt;&lt;br /&gt;It's always important to think about the shortcomings of these plots.  Obviously, the bubbles are not growing in a linear fashion, and this can be misleading in some cases.  In addition, things are a bit crowded. That's not even mentioning that some bubbles tend to be too small, while others are too large.  These aren't the prettiest plots in the world, but they're a decent start.  I encourage you to try out different data and ways of working with the bubbles on your own.&lt;br /&gt;&lt;br /&gt;So, let's switch gears now to some other types of data along with another method of creating bubble plots.&lt;br /&gt;&lt;br /&gt;Perhaps we're interested in team home runs, stolen bases, and walks on the same plot.  In other words, let's see which teams are more like Adam Dunn and which are more like Juan Pierre  First, go ahead and load in the "teamsdata.csv" file from a previous tutorial.&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 51, 255);"&gt;##read in new data&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 51, 255);"&gt;teams &lt;- read.csv(file="teamsdad.csv", h=T)&lt;/span&gt; &lt;span style="color: rgb(51, 51, 255);"&gt;head(teams)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;For this portion of the tutorial, I'll be using the "&lt;span style="color: rgb(153, 0, 0);"&gt;symbols()&lt;/span&gt;" function, which plots shapes with borders in a plot.  Asthetically, these are prettier.  But we'll have to think about a few things before we begin to plot.  I am going to take these explanations directly from &lt;a href="http://flowingdata.com/2010/11/23/how-to-make-bubble-charts/"&gt;this fantastic tutorial&lt;/a&gt;&lt;a href="http://flowingdata.com/2010/11/23/how-to-make-bubble-charts
