The Latest Numbers: TAILS: Time Average Inpredictable Line Smidgen

Load Libraries

Show code

library(tidyverse)
library(nflfastR)
library(nfltools)
library(nflplotR)

Get Time Average Lead Data

Show code

seasons <-
  2018:2023

raw_df <-
  purrr::map_df(
    seasons,
    nfl_mvt_season
  )

Intro

Previously, I made some code that calculates the (time) average lead for an NFL game using nflfastR play-by-play data. Based on this data, I developed a simple retrospective expected win percentage model. Here, I play with another use of time average lead data: in-season handicap adjustments.

It is a common practice for traditional handicappers to adjust team power ratings weekly based on performance and health.

Adjusted Time Average Lead Metric

The spread provides an estimate of the median final score differential. To provide a context adjustment (e.g., opponent, home field advantage, weather), assume linear growth between (1) a 0 point differential at opening kickoff and (2) final score differential implied by the spread.

The time average lead implied by this linear model is 1/2 of the spread.

Toy Example

Assume Team A is favored by 6 points (-6 point spread). Then this simple adjustment model expects Team A to lead, on average, by 3 points (1/2 of the 6 point spread). The actual calculated time average lead for the game reveals Team A led, on average, by 2 points. Although Team A held a lead, on average, for the course of their game, they slightly underperformed the Vegas expectation (i.e., 2 point actual average lead - 3 point expected average lead = -1). This difference between actual and expected time average lead could provide some information for a power rating adjustment.

Calculate Adjustment

Show code

df <-
  raw_df %>%
  # This is a convenient way to ensure each
  # game shows up in data once.
  filter(home_away == "home") %>%
  # Removes bye weeks entries
  filter(is.na(home_away) == F) %>%
  # The spread provides an estimate of the median
  # final score differential (and incorporates home
  # field advantage). To provide an adjustment, assume linear
  # growth from (1) a 0 point differential  at opening kickoff
  # to (2) final score differential implied by spread. The
  # time average lead implied by this linear model is
  # 1/2 of the spread.
  mutate(
    adj_time_avg_lead = time_avg_lead + (0.5 * team_spread)
  )

Distribution of Adjusted Time Average Leads

To understand the distribution of adjusted time average leads, a density plot is created and some summary statistics are calculated.

Ideally, our adjustment would yield a distribution centered at 0. It does!

In addition, the standard deviation is about 8 points (one possession). This is an easy value to remember and interpret.

Show code

p_adj_distribution <-
  df %>%
  ggplot(aes(x = adj_time_avg_lead)) +
  geom_density() +
  geom_vline(
    xintercept = 
      c(-16, -8, 0, 8, 16),
    linetype = "dashed"
  ) +
  labs(
    x = "Adjusted Time Average Lead",
    y = "Density",
    title = "Vegas Spread Adjusted Time Average Lead",
    subtitle = paste0(
      "Mean = ",
      mean(df$adj_time_avg_lead) %>%
        round(., digits = 1),
      ". Median = ",
      median(df$adj_time_avg_lead) %>%
        round(., digits = 1),
      ". Std Deviation = ",
      sd(df$adj_time_avg_lead) %>%
        round(., digits = 1)
    ),
    caption = paste0(
      "2018 - 2023 Regular Season Games (n = ",
      nrow(df),
      ")"
    )
  ) +
  scale_x_continuous(
    breaks = seq(-32, 32, by = 4),
    minor_breaks = NULL
  ) +
  scale_y_continuous(
    minor_breaks = NULL
  ) +
  theme_light()

ggsave(
  "adjusted_time_avg_lead.png",
  plot = p_adj_distribution,
  width = 6,
  height = 4,
  units = "in",
  dpi = "retina"
)

TAILS: Time Averge inpredictable Line Smidgen

Here, I put together a toy adjustment.

A common discussion point early in a season is the question: How should you best incorporate early results with your preseason priors? A single game isn’t much use. As a first guess, I’ll consider the most recent four or five games to be an appropriate number to develop an average that is recent enough to still be relevant.

Adjustments should be modest (e.g., 3 points should be extremely rare). The distribution of an average of five games should be about half as wide as the underlying distribution. That would put a result that is two standard deviations above the mean at roughly +8 points, which is way too big for power rating adjustment. Because this is just a toy project, I’m going to choose to scale the results by 0.2 to pull this hypothetical two standard deviations above the mean result to under 2 points. This will yield a result to adjust inpredictable’s market-based ratings (that are derived from game lines for the entire season) by an algorithmically-defined smidgen.

A key element any modern sports metrics is an asinine acronym. My daughter loves Miles “Tails” Prower of the Sonic the Hedgehog series. Therefore, I am obliged to shoehorn my idea into the TAILS acronym: Time Average inpredictable Line Smidgen.

Current 2023 TAILS Adjustment

Show code

tails <-
  raw_df %>%
  filter(season == 2023) %>%
  filter(!(is.na(opponent))) %>%
  mutate(
    adj_time_avg_lead = time_avg_lead + (0.5 * team_spread)
  ) %>%
  group_by(team) %>%
  slice_tail(n = 5) %>%
  summarize(
    tails = mean(adj_time_avg_lead),
    .groups = "drop"
  ) %>%
  mutate(
    tails = 0.2 * tails
  )

inpred <-
  # This is a function I wrote as my first bit of
  # webscraping code. It retrieves info from the
  # inpredictable power ratings at the time it is run.
  get_current_inpredictable()

inpred <-
  left_join(
    inpred,
    tails
  ) %>%
  mutate(
    adj_gpf = gpf + tails
  )

Plot

Show code

p_adj_gpf <-
  inpred %>%
  ggplot(
    aes(
      x = gpf,
      y = reorder(factor(team),
                  adj_gpf)
    )
  ) +
  geom_vline(
    xintercept = 0,
    linetype = "dashed"
  ) +
  geom_segment(
    aes(
      x = gpf,
      xend = adj_gpf,
      yend = team
    ),
    arrow = arrow(
      type = "closed",
      length = unit(
        0.1, "cm"
      )
    )
  ) +
  geom_point() +
  geom_nfl_logos(
    aes(
      team_abbr = team,
      x = adj_gpf),
    width = 0.04,
    alpha = 0.4
  ) +
  scale_x_continuous(
    breaks = seq(-18, 18, by = 3),
    minor_breaks = NULL
  ) +
  scale_y_discrete(
    labels = NULL,
    breaks = NULL
  ) +
  labs(
    x = "TAILS Adjusted Generic Points Favored",
    y = NULL,
    title = "Post Week 8 NFL Power Rankings",
    subtitle = "As of 10/31/2023.",
    caption = "Baseline: inpredictable.com Betting Market Ratings\nMy silly adjustment calculated using nflfastR data."
  ) +
  theme_light()

ggsave(
  "tails_adj_gpf.png",
  plot = p_adj_gpf,
  width = 6,
  height = 4.5,
  units = "in",
  dpi = "retina"
)

Table

Show code

inpred %>%
  select(
    team, gpf, tails, adj_gpf
  ) %>%
  arrange(
    desc(adj_gpf)
  ) %>%
  mutate(
    tails = round(tails, 2),
    adj_gpf = round(adj_gpf, 2),
    rank = row_number()
  ) %>%
  knitr::kable()

team	gpf	tails	adj_gpf	rank
SF	6.5	-0.11	6.39	1
BAL	4.7	1.57	6.27	2
KC	5.8	-0.01	5.79	3
PHI	5.4	-0.32	5.08	4
CIN	5.0	-0.04	4.96	5
BUF	4.9	-0.58	4.32	6
DAL	4.1	0.06	4.16	7
JAX	2.3	1.63	3.93	8
MIA	4.1	-0.46	3.64	9
DET	2.2	0.31	2.51	10
LAC	1.7	0.73	2.43	11
SEA	0.7	0.33	1.03	12
NO	0.1	-0.05	0.05	13
ATL	0.6	-0.80	-0.20	14
HOU	-1.6	1.38	-0.22	15
MIN	-1.2	0.58	-0.62	16
CLE	-0.5	-0.13	-0.63	17
PIT	-1.3	-0.49	-1.79	18
DEN	-1.8	0.01	-1.79	19
TEN	-2.3	0.20	-2.10	20
TB	-2.2	-0.37	-2.57	21
LA	-2.4	-0.24	-2.64	22
NYJ	-2.8	-0.24	-3.04	23
GB	-1.7	-1.43	-3.13	24
WAS	-3.1	-0.15	-3.25	25
NE	-2.2	-1.27	-3.47	26
IND	-3.2	-0.71	-3.91	27
LV	-3.8	-0.65	-4.45	28
CHI	-5.6	0.90	-4.70	29
NYG	-5.4	0.31	-5.09	30
ARI	-6.9	-0.16	-7.06	31
CAR	-6.9	-0.17	-7.07	32