Tuesday, 16 June 2015

Fourteen wins

It's all over. Final score:

Model Predictions: 12/15 (80%) = 6/8 (Rd. 1) + 3/4 (Rd. 2) +2/2 (Rd. 3) + 1/1 (Rd. 4)
My Predictions: 14/15 (93%) = 7/8 (Rd. 1) + 4/4 (Rd. 2) + 2/2 (Rd. 3) + 1/1 (Rd. 4)

Both I and the model predicted a majority of series right, and predicted the cup winner correctly from day one. My bracket was in the 100th percentile on nhl.com's bracket challenge (rank 720).

Did I get lucky? Yes. Partly. The model's in-sample error rate from previous seasons was 30%; this year it was only 20%. So the model beat its own average error rate this year. Next year could be different (and this year could have been different if a couple of the close series, e.g., Washington vs. New York Islanders, had turned out differently).

On the other hand, the success of both the model's predictions and my predictions this year suggest: (i) that the Stanley Cup Playoffs are at least somewhat predictable; and (ii) as useful as statistical models are, there will likely always be room for human expertise so long as there are factors contributing to teams' playoff success that are difficult to measure (I beat the model on 3/4 series in which we disagreed).

Lastly, it's worth acknowledging the competition. SAP had an impressive prediction interface on nhl.com that predicted the outcome of each series. Their record was:

SAP (nhl.com): 10/15 (67%) = 4/8 (Rd. 1) + 3/4 (Rd. 2) + 2/2 (Rd. 3) + 1/1 (Rd. 4)

SAP got every series in the East wrong in the first round (picking Ottawa, Detroit, Pittsburgh, NYI), and picked Montreal over Tampa Bay in round 2. Does this mean their model is worse than mine? Not necessarily. Their 33% error rate is well within the expected error range of my model. Of the two series the models picked differently (Pittsburgh-NYR, Washington-NYI), one was very close. The only series that seems like a strange pick for SAP's model is NYR-Pittsburgh.

However, the success of the Sixteen Wins predictions this year is a sign of hope for hacks like me - devoted fans with publicly available data - wanting to try our hands at predicting the playoffs on tight time and financial budgets.