扇形

预测2009 - 2010年的胜利,请考虑“平均回归”


With the release of the 2009-2010 NBA schedule yesterday, we saw the first wave of predictions for the upcoming season. In making predictions for the Blazers or any other team the first thing most people consider, whether explicitly or implicitly, is a team's record the previous season. The previous season serves as a baseline in our mind and then we try to figure out if the team will improve or get worse. By and large, this is a sensible way of thinking.

但是,应该考虑到我们经常忘记的思维过程的另一个考虑因素 - 是一种称为的现象回归均值对平均值的回归是概率和统计数据的技术术语,它指的是,无论是什么,事物往往会恢复正常。这个词是由回归的发明者弗朗西斯·加尔顿(Francis Galton)创造的,当时他注意到,父母的后代往往比父母短 - 至少这是他们在统计课上总是讲述的故事。回归对NBA胜利的平均值将暗示,一个赛季中倾向于赢得很多比赛的球队更有可能在下个赛季赢得更少的比赛,或者相反,在一个赛季中赢得几场比赛的球队更有可能更有可能赢得更多的比赛下个赛季的比赛。换句话说,2009年的60场胜利球队比2010年60-64更容易赢得55-59场比赛,而2000年的25胜球队比2001年的25-21场比赛更有可能赢得26-30场比赛。这会发生在NBA吗?确实确实如此。实际上,对平均值的回归非常明显,并且可以控制球队的平均年龄。下面的详细信息。

从赛季到赛季的平均胜利回归,1956年至2009年的NBA

实际上,评估回归对从一个赛季到下一个NBA的胜利的重要性相当容易。团队赛季的记录很容易在basketball-reference.com或databasebasketball.com上获得,而经验问题很简单:如果球队在平均胜利的次数高(41)之上,那么在下个赛季赢得更少的比赛?有多种方法可以分析地回答这个问题,所有这些指向回归均值非常健壮。为了证明这是发生这种情况的,我只是将胜利数量的平均变化绘制在上赛季的胜利中,从1956年到2009年的所有球队:

Rtm_wins56-09_medium

该图显示,上个赛季赢得了40或41场比赛的球队,下一个赛季平均赢得了大约40至41场比赛,因为比赛数量的平均变化约为0。但是,对平均值的回归发生。每个赛季赢得46场比赛的球队往往赢得大约2场fewergames the following season (44). In contrast, teams that win 37 games in one season win, on average, about3 moregames the next season (40). In addition, the farther a team moves away from the mean, the stronger the pull of the mean. 60 win teams win, on average, about6 fewergames the following season (54), while 25 win teams tend to win about6 moregames the following season (31).

在共同语言,这张图显示,糟糕的球队s tend to get better and elite teams tend to get worse. A fairly sensible implication of this pattern is that, as many have suggested, going from 54 to 60 wins is "harder" than going from 40 wins to 46 wins. Why is true? There are a variety of possible reasons, but the most important one is probably luck. Teams that do well tend to avoid injuries, have favorable schedules, and win close games. Teams with bad luck (injuries, bad chemistry, or bad bounces) tend to see their fortunes brighten the following season simply because average luck is more likely than bad luck. I was fairly confident that I would see evidence of regression to the mean in the data, but the strength, regularity, and linearity of the pattern surprised me. I figured that the graph would be fairly flat around the middle, with teams with 35 to 47 wins not regressing to the mean much, but truly elite and terrible teams regressing to the mean quite strongly.

Regression to the Mean in wins from Season to Season, the NBA from 1980 to 2009

To check on the robustness of this pattern, consider the graph below, which restricts the analysis to season from 1980 to 2009. Though the graphs look almost indentical, they are run on different data, which is one indication of how regular this pattern is:

Rtm_wins80-09_medium

Is it all about Age?

人们可能会怀疑这种趋势是否仅仅是我们以前谈论过的事情的反映。也就是说,不好的团队倾向于变得更好,精英团队变得更糟,只是反映了“坏”球队真的只是年轻的球队,而精英团队充满了退伍军人?简短的答案是no。While it is true that older teams tend to get worse and younger teams tend to get better (with the break even point being an average age of 27 years), regression to the mean is still strong controlling for age. In other words, teams with an average age greater than 27 tend to have a worse record the following season, but the higher the number of wins the previous season, the worse their record.

为了说明这一点,下图显示了获胜数量的平均变化,这是以前的胜利的函数controlling for a team's average age in the previous season。(For those that care, the y-axis is actually the residuals from a regression of change in wins on the age of a team in the previous season). That the slope of the line in this graph is less steep indicates that age was driving some--but not all--of the pattern in the previous graphs.

Rtm_winscage56-09_medium

So what does this mean for the Blazers in 2009-2010?

Since the Blazers were both very young and very good in 2009, what should we expect in 2010? The youthfulness of the Blazers suggest that they should improve, but improving from 54 wins is very difficult. In particular,teams that have won 54 games have won an average of 51 games the following season。On the other hand,平均年龄为24至25岁的球队,下个赛季赢得大约5场比赛。Quick and dirty regressions of wins on a set of dummy variables for wins the previous season and age the previous season yields a prediction of 54-56 wins for the Blazers in 2010, depending on some minor technical assumptions.While I do not believe that theses are the only factors one should consider in projecting the Blazers season in 2010,我也不会忽略他们。如果您认为开拓者队将在2010年赢得超过54-56的比赛,那应该是因为您相信安德烈·米勒(Andre Miller)的加入以及奥登(Oden)和其他玩家的改进将使正常的回归弥补,以达到这种情况。在NBA。

最后,对于那些对平均值,图形,回归和鞭打不感兴趣的人来说,下面是一场54次胜利之后所有球队的团队记录清单(因此,Lal在62场比赛中赢得了54场比赛,但43场比赛中赢得了54场比赛。在63中):

团队 胜利 Age prior year
PH1 1960 48 26.43747
LAL 1963 43 26.49471
BOS 1968 48 30.05789
NYK 1969 60 26.63787
chi 1974 47 29.21364
BOS 1976 44 29.32505
WA1 1979 39 28.55113
LAL 1981 57 27.44369
LAL 1984 62 28.10951
PHI 1986 45 28.51308
DEN 1988 44 28.99218
DET 1988 63 27.97651
PHO 1990 55 27.21177
UTA 1991 55 29.1215
CLE 1993 47 29.28346
cha 1997 51 29.47461
DET 1997 37 28.57562
IND 1999 56 31.22052
ORL 1999 41 29.17436
米娅 1999 52 30.66639
DET 2004 54 27.88333
DET 2005 64 28.37027
PHO 2006 61 28.01781

As you can see, some 54 win teams improved, more got worse. In addition, last year's Blazers, with an average age of 24.5, is far younger than all previous 54 win teams. Thus, there is no perfect historical analogy for the current team.

Nonetheless, seeing the strength of regression to the mean in the NBA probably has made me a bit more skeptical about the Blazers chances of winning 60 games in 2010 (my original prediction), and it has had an even bigger impact on the way I will think about the rest of the league in 2010. Anyway, this is far from the final word on making projections for the coming season. It's just an interesting pattern that is easy to document that I thought the Blazersedge community might find interesting. Does this change your outlook on the Blazers in 2009-10? For other teams? Why or why, not?

有其他分析的替代解释,评论,问题或建议吗?