The Latest Numbers: 2021 NFL Regular Season Win Luck

Intro

The following is an assessment of the agreement between 2021 regular season play and 2021 season win totals. The basic unit of analysis the time average lead for each game played in 2021.

Using 2011-2020 data, I made a simple model estimating the expected win percentage for a given time average lead (and also account for whether game is home or away game). As noted in a previous post, the time average lead for a game is a metric that summarizes its overall competitiveness.

Ten Least Likely Wins

Wk	Team	Opponent	Time Avg Lead (Points)	Avg Win Percentage (%)
5	BAL	vs IND	-9.0	6.5
16	CHI	@ SEA	-6.3	12.2
17	TB	@ NYJ	-5.6	14.5
2	TEN	@ SEA	-5.6	14.6
9	BAL	vs MIN	-5.9	15.2
5	PHI	@ CAR	-5.4	15.6
18	SF	@ LA	-5.4	15.6
2	BAL	vs KC	-5.7	16.2
5	NE	@ HOU	-5.1	16.5
1	KC	vs CLE	-5.3	18.0

Graphical Summary

Table

Tm	Wins	Pythag Wins	Time Avg Wins	Pythag Win Luck	Time Avg Win Luck
ARI	11	10.5	10.8	0.2	0.5
ATL	7	4.9	7.1	-0.1	2.1
BAL	8	8.4	8.1	-0.1	-0.4
BUF	11	13.1	12.3	-1.3	-2.1
CAR	5	5.7	7.3	-2.3	-0.7
CHI	6	5.9	6.9	-0.9	0.1
CIN	10	10.5	8.7	1.3	-0.5
CLE	8	7.9	9.5	-1.5	0.1
DAL	12	12.2	10.3	1.7	-0.2
DEN	7	8.9	7.9	-0.9	-1.9
DET	3	5.1	5.3	-2.3	-2.1
GB	13	10.4	10.6	2.4	2.6
HOU	4	4.1	6.3	-2.3	-0.1
IND	9	10.6	11.1	-2.1	-1.6
JAX	3	3.4	4.0	-1.0	-0.4
KC	12	11.2	11.2	0.8	0.8
LA	12	10.6	10.3	1.7	1.4
LAC	9	8.8	9.3	-0.3	0.2
LV	10	6.9	8.2	1.8	3.1
MIA	9	7.6	9.0	0.0	1.4
MIN	8	8.5	9.8	-1.8	-0.5
NE	10	12.4	9.7	0.3	-2.4
NO	9	9.3	8.1	0.9	-0.3
NYG	4	4.1	5.2	-1.2	-0.1
NYJ	4	4.1	4.5	-0.5	-0.1
PHI	9	9.9	7.9	1.1	-0.9
PIT	9	7.0	6.4	2.6	2.0
SEA	7	9.3	9.8	-2.8	-2.3
SF	10	10.1	9.7	0.3	-0.1
TB	13	12.0	10.8	2.2	1.0
TEN	12	10.2	9.9	2.1	1.8
WAS	7	6.0	6.8	0.2	1.0

Method

I put together a data set for the 2011 through 2020 regular seasons. Doing some EDA, the following histogram stood out.

I fit an extremely simple logistic regression to this data set: for all home teams: win probability as a function of time average lead. The model summary looked acceptable.

I binned the time average leads into 0.5 point bins, and I plotted both the average winning percentage for each bin. It looked visually like logistic regression would be appropriate, so I fit an extremely simple logistic regression to this data set: for all home teams: win probability as a function of time average lead. For the away perspective, I fit a separate model to the same data set for away games.

The “bins + model fit” plot for the away model is provided below.

The home model summary is provided below.


Call:
glm(formula = win ~ mean_point_diff, family = "binomial", data = results_df %>% 
    filter(home_away == "home"))

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.8364  -0.5750   0.1410   0.5969   2.6841  

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)      0.08979    0.05768   1.557     0.12    
mean_point_diff  0.30574    0.01248  24.495   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 3286.4  on 2399  degrees of freedom
Residual deviance: 1875.6  on 2398  degrees of freedom
AIC: 1879.6

Number of Fisher Scoring iterations: 6