Estimates of NFL Regular Season Record Luck
This post takes a look at two estimates of win luck, or roughly, actual record compared with “expected” record.
First, Pythagorean Win Expectation. I’ll use the Daryl Morey formulation, which built on Bill James’s work for baseball. The basic logic of Pythagorean Win Expectation is as follows: total points scored and allowed for an entire regular season are more reliable indicator of “true” ability than the sum of individual game outcomes (i.e., win/loss/tie). Thus, I think of Pythagorean Win Expectation as an across game estimate of expected regular season record.
Second, I’ll use my time average lead concept. This approach estimates the expected win percentage for each individual game, and estimates expected wins for a season using the sum of these individual game expected win percentage estimates. Since the estimation is done at the individual game level, I’ll call this a within game estimate.
The code below fits a logistic model to the 2015 to 2021 regular seasons.
train <- map_dfr(
2015:2021,
nfl_mvt_season) %>%
filter(home_away == "home")
train_results <- load_pbp(2015:2021) %>%
filter(season_type == "REG") %>%
mutate(home_result = case_when(
result > 0 ~ "Win",
result == 0 ~ "Tie",
result < 0 ~ "Loss"
)) %>%
group_by(game_id) %>%
summarize(
season = season,
week = week,
team = home_team,
opponent = away_team,
result = home_result,
.groups = "drop") %>%
distinct() %>%
mutate(game_id = NULL)
train_df <- left_join(
train,
train_results) %>%
filter(result != "Tie") %>%
filter(home_away == "home") %>%
mutate(
result = ifelse(
result == "Win",
1L, 0L),
home_away = as.factor(home_away)
)
model <- rstanarm::stan_glm(
result ~ -1 + time_avg_lead,
data = train_df,
family = "binomial",
refresh = 0
)
test <- nfl_mvt_season(2022)
beta <- coef(model)
test_df <- test %>%
mutate(exp_wp = 1/(1 + exp(-beta * time_avg_lead)))
df_2022 <- test_df %>%
group_by(team) %>%
summarize(exp_wp = mean(exp_wp, na.rm = T),
.groups = "drop")
Next, I assemble the information to calculate the Pythagorean Win Expectation, Actual Wins, and the tow “Win Luck” estimates.
home_2022 <- load_pbp(2022) %>%
filter(season_type == "REG") %>%
group_by(game_id) %>%
slice_head(n = 1) %>%
ungroup() %>%
mutate(home_win = ifelse(result > 0,
1L, 0L)) %>%
group_by(home_team) %>%
summarize(games_played = n(),
points_scored = sum(home_score),
points_allowed = sum(away_score),
wins = sum(home_win),
.groups = "drop") %>%
rename(team = home_team)
away_2022 <- load_pbp(2022) %>%
filter(season_type == "REG") %>%
group_by(game_id) %>%
slice_head(n = 1) %>%
ungroup() %>%
mutate(away_win = ifelse(result < 0,
1L, 0L)) %>%
group_by(away_team) %>%
summarize(games_played = n(),
points_scored = sum(away_score),
points_allowed = sum(home_score),
wins = sum(away_win),
.groups = "drop") %>%
rename(team = away_team)
team_2022 <- rbind(
home_2022,
away_2022) %>%
group_by(team) %>%
summarize(
games_played = sum(games_played),
points_scored = sum(points_scored),
points_allowed = sum(points_allowed),
wins = sum(wins),
.groups = "drop") %>%
mutate(actual_wp = wins/games_played,
pythag_wp = (points_scored ^ 2.37)/((points_scored ^ 2.37) + (points_allowed) ^ 2.37)) %>%
left_join(df_2022) %>%
mutate(time_avg_win_luck = (actual_wp - exp_wp) * 17,
pythag_win_luck = (actual_wp - pythag_wp) * 17)
Note, the “actual” wins for BUF and CIN are estimates too, in that their 16 game win percentage is extrapolated to 17 games to calculate actual wins.
The summary for the time average win expectation model:
summary(model,
digits = 3)
Model Info:
function: stan_glm
family: binomial [logit]
formula: result ~ -1 + time_avg_lead
algorithm: sampling
sample: 4000 (posterior sample size)
priors: see help('prior_summary')
observations: 1801
predictors: 1
Estimates:
mean sd 10% 50% 90%
time_avg_lead 0.322 0.015 0.304 0.322 0.341
Fit Diagnostics:
mean sd 10% 50% 90%
mean_PPD 0.547 0.008 0.536 0.547 0.557
The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
MCMC diagnostics
mcse Rhat n_eff
time_avg_lead 0.000 1.001 1520
mean_PPD 0.000 1.000 3862
log-posterior 0.021 1.001 1152
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
p <- team_2022 %>%
ggplot(aes(x = time_avg_win_luck,
y = pythag_win_luck)) +
geom_hline(yintercept = 0) +
geom_vline(xintercept = 0) +
geom_nfl_logos(aes(team_abbr = team),
width = 0.05) +
geom_label(aes(x = -2,
y = 3,
label = paste0("Correlation: ",
cor(team_2022$time_avg_win_luck,
team_2022$pythag_win_luck) %>%
round(2)))) +
scale_x_continuous(breaks = seq(-6, 6, by = 1),
minor_breaks = NULL) +
scale_y_continuous(breaks = seq(-6, 6, by = 1),
minor_breaks = NULL) +
labs(x = "Time Average Win Luck",
y = "Pythagorean Win Luck",
caption = "Data via nflfastR. Plot via nflplotR.",
title = "NFL 2022 Regular Season Win 'Luck'",
subtitle = "Within Game (Time Average) and Across Game (Pythagorean)") +
theme_light()
ggsave(plot = p,
filename = "nfl_2022_win_luck.png",
height = 5.25,
width = 5,
units = "in",
dpi = "retina")