Wednesday, 16 April 2014

Model v 1.0 and 2014 Playoff Picks!

Hello Blogosphere!

For those who don't know me, my name is Matt Burgess.  By day, I'm an ecologist/applied economist finishing up my PhD at University of Minnesota this semester, and heading to UC Santa Barbara in the summer to start a post-doc.  By night, I'm a die hard hockey fan.  I grew up in Brossard, just outside of Montreal, Canada, so more specifically, I'm a die hard Habs fan.  Inspired by Nate Silver's popular FiveThirtyEight blog (I'm also a political junkie), I decided last season to try my hand at statistically modeling the NHL Playoffs - the idea being that 7-game series should be reasonably predictable (compared to, say, the NFL playoffs or March Madness).

So this past year, I downloaded team and player stats from every NHL season and playoffs going back to 1967-1968 (beginning of the modern era), and have been playing around with models.  Here are predictions from a logit model (response variable is binary W/L) I selected based on AIC.  I won't give away exactly what specific effects are included yet, but I will say that both the overall team stats and season series stats are very important.

The modeling approach I am currently using is very much in its beta version, as there are many things I haven't added in yet but would like to. For example, currently the models do not explicitly consider multi-season trends in a particular matchup, injuries and other within season roster trends, or newer measures such as Corsi for percentage, but I will hopefully be able to add some things like that in by next year.

Sixteen Wins Stanley Cup Playoff Predictions 2014:
April 16, 2014 (I will post revised predictions after each round)

Descriptive Stats:
Time period considered: Modern era (1968-present)
n = 594 (599 - 5 series with no season series)
DF: (593 (Total) = 5 (Fitted) + 588 (Lack of Fit))
Model Failure Rate (all years)* = 11.2% (i.e. fraction of series where team model predicted to have p[W] > 50% went on to lose)
Series model would have gotten wrong in the last 3 seasons**:
2013: Anaheim vs. Detroit (Pred. p[W] = 82.3% for ANA) (DET won in 7)
2012: New Jersey vs. NYR (Pred. p[W] = 55.3% for NYR) (NJD won in 6)
2011: Chicago vs. Vancouver (Pred. p[W] = 98.1% for CHI) (VAN won in 7) 
          NYR vs. Washington (Pred. p[W] = 71.8% for NYR) (WSH won in 5) 
          Detroit vs. San Jose (Pred. p[W] = 63.4% for DET) (SJS won in 7) 

*based on in-sample predictions.
**based on out-of-sample predictions (model re-fit excluding years 2011-2013)

Figure 2014.1. Comparison of model predictions and observed series winning percentages (n = 594). W% = 0.5 (dashed line) separates teams that won their series (W% > 0.5) from those that lost (W% < 0.5).

Model Predictions for 2014:
Predicted Winner in Bold (Predicted p[W] for winner):
Atlantic Division:
Boston vs. Detroit (94.7%)
Tampa Bay vs. Montreal (75.2%)
Tampa Bay vs. Detroit (90.6%)

Metropolitan Division:
Pittsburgh vs. Columbus (97.4%)
New York vs. Philadelphia (84.4%)
Pittsburgh vs. NYR (72.4%)

Central Division:
Colorado vs. Minnesota (88.1%)
St. Louis vs. Chicago (96.9%)
Colorado vs. Chicago (71.7%)

Pacific Division:
Anaheim vs. Dallas (76.7%)
San Jose vs. LA (82.4%)
LA vs. Dallas (60.9%)

Eastern Conference Final:
Tampa Bay vs. NYR (99.8%)

Western Conference Final:
Colorado vs. Dallas (97%)

Stanley Cup Final:
Colorado vs. Tampa Bay (68.6%)

Coming Soon: Predicted Odds of Each Team Winning the Cup (integrated over all possible brackets)

No comments:

Post a Comment