The GreenStalk

Any Given Sunday – Part II

Posted in Sports by Paul Grana on April 14, 2012

So, how much randomness is there in sports?

Last week I tried to answer this by looking at the differences between the win totals of the good and bad teams.  But that doesn’t tell the whole story, because it ignores who those wins came against.

Instead, it’s more interesting to ask the question: if one team is better than another, what is the probability that the better team wins?

Now, there’s no perfect way to define a “better team,” but we can use the team’s winning percentage as a decent estimate[1].  The difference between two teams’ winning percentages then gives us their mismatch in talent.  See the chart below (dashed lines indicate low base sizes):

It’s no surprise to see the lines going up and to the right – a bigger mismatch between teams should lead to greater odds of winning.

But the parity in the NHL and MLB is pretty striking.  This chart implies that in those sports, even in the most lopsided games, the stronger team could only be expected to win about 60% of the time[2].

This chart can also spur some interesting debates about why these charts look the way they do.  I don’t have the answers, but a few hypotheses:

1)      There is certainly some inherent randomness, which varies by sport.  I think this shows pretty conclusively that baseball and hockey have a greater proportion of randomness than football and basketball.

2)      The final score values also have some impact.  Consider that a typical hockey score is 2-1, versus a typical basketball score of 90-85.  A “point” in hockey often accounts for 50-100% of the team’s final score, while a “point” in basketball accounts for about 1% of the team’s final score.  When you have fewer scores happening in each game, this makes the outcome “lumpier”, and therefore more random.

Any other hypotheses?  Does this make you think about your favorite sports any differently?  Let me know in the comments.

[1] The nice thing about using win percentage is that it can be used anytime during the season and across sports, without the kinds of biases we saw with the Gini coefficient.

[2] A disclaimer about the methodology: I acknowledge that there is some circular logic to using win percentage as the dependent variable.  After all, if there is a lot of randomness in a given sport, then the win percentages in that sport will be a less accurate measure of those teams’ quality… thus making the charts above look like there is even more randomness.  There’s no good way to correct for this, so I’m going to just note it and leave it as is.