Wednesday, 16 April 2014

Comparing Models with Binary & Continuous Response Variables

In case people are skeptical of the high probabilities the model is predicting for some series, two things:

1) It is important to note that these predicted probabilities are point estimates produced by the model, and thus have some error. One way to estimate this error is to bootstrap (i.e. re-run the model a whole bunch of times on versions of the dataset resampled with replacement and compare the different probability estimates you get), which I may do at some point if people are interested (I haven't done it yet because it's a little complicated in the stats software I use, but definitely can be done, so I'll do it if I can find the time in the next few days).

2) For comparison, here are the original predictions from my last post side by side with predictions from a model with identical explanatory variables and a continuous response variable (winning % (W%) (0-1, where 0 means being swept, 1 means sweeping)), along with 95% confidence intervals on the predicted winning percentages (these might also be helpful in picking the series length):

Predicted Winner in Bold (Predicted p[W] for winner - Logit Model) (Predicted W% for winner (CI) - Continuous Model):
Atlantic Division:
Boston vs. Detroit (94.7%) (0.69 (0.36,1))
Tampa Bay vs. Montreal (75.2%) (0.58 (0.25,0.91))
Tampa Bay vs. Detroit (90.6%) (0.65 (0.32,0.98))

Metropolitan Division:
Pittsburgh vs. Columbus (97.4%) (0.74 (0.41,1))
New York vs. Philadelphia (84.4%) (0.62 (0.29,0.95))
Pittsburgh vs. NYR (72.4%) (0.57 (0.24,0.90))

Central Division:
Colorado vs. Minnesota (88.1%) (0.63 (0.30,0.97))
St. Louis vs. Chicago (96.9%) (0.73 (0.40,1))
Colorado vs. Chicago (71.7%) (0.55 (0.22,0.88))

Pacific Division:
Anaheim vs. Dallas (76.7%) (0.59 (0.26,0.92))
San Jose vs. LA (82.4%) (0.61 (0.28,0.94))
LA vs. Dallas (60.9%) (0.54 (0.21,0.87))

Eastern Conference Final:
Tampa Bay vs. NYR (99.8%) (0.93 (0.60,1))

Western Conference Final:
Colorado vs. Dallas (97%) (0.72 (0.39,1))

Stanley Cup Final:
Colorado vs. Tampa Bay (68.6%) (0.54 (0.21,0.87))

As you can see, with the continuous model, the picks are the same, but the only series with a pick with >95% confidence is Tampa Bay over NYR. The observed vs. predicted plot for the continuous model is shown below. It is very similar to the plot shown for the logit model in the previous post.

Figure 2014.2. Observed vs. predicted winning percentages from continuous model. 
 

No comments:

Post a Comment