knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

This short guide focuses on using espnscrapeR or the nflverse/espnscrapeR-data repo to access QBR data.

Setup

If you have never installed the necessary R packages, go ahead and expand the collapsed section below, otherwise skip ahead to the "Load and Prep" stage.

Package Installation

You'll need the following packages to get started. Note that as of now, espnscrapeR is not on CRAN so you'll need to install it from GitHub as seen below.

install.packages(c("tidyverse", "gt", "remotes"), type = "binary")
remotes::install_github("espnscrapeR")

Load and Prep

Go ahead and load the packages to get started.

library(espnscrapeR)
library(tidyverse)
library(gt)

You can get the data directly from ESPN's API.

# season level data (1x row per QB per season)
qbr_2020 <- get_nfl_qbr(2020, week = NA)

But it'll be easier and recommended to just read in the data directly with either nflreadr or just the raw URL.

nfl_qbr_season <- readr::read_csv("https://raw.githubusercontent.com/nflverse/espnscrapeR-data/master/data/qbr-nfl-season.csv")
nfl_qbr_season <- nflreadr::load_espn_qbr("nfl", seasons = 2006:2020)

This is the QBR values for all QBs at the season level from 2006 to now. The dplyr::glimpse() function can be used to quickly see the type of the columns (IE numeric, character, etc) and the top few values. You can think of it as a beefed up version of the str() function.

nfl_qbr_season %>% 
  glimpse()

Work with the data

Group By

We can group_by() the season and find the median QBR per season.

nfl_qbr_season %>% 
  group_by(season) %>% 
  summarize(qbr_median = median(qbr_total), .groups = "drop")

We can also group_by() the season and find the max n values per season.

top_16_per_yr <- nfl_qbr_season %>% 
  filter(qb_plays >= 100) %>% 
  select(season, team_abb, name_short, qbr_total) %>% 
  # group by season
  group_by(season) %>% 
  # get top 16
  slice_max(order_by = qbr_total, n = 16) %>% 
  # add the grouped median
  mutate(qbr_median = median(qbr_total)) %>% 
  ungroup()

top_16_per_yr

We can then visualize this with a quick ggplot.

top_16_per_yr %>% 
  ggplot(aes(x = season, y = qbr_total, group = season)) +
  geom_boxplot(alpha = 0.5) +
  geom_jitter(width = 0.2, alpha = 0.5) +
  geom_point(aes(y = qbr_median), color = "red", size = 3) +
  theme_minimal()

Alternatively you can also find the median by quarterback.

nfl_qbr_season %>%
  filter(qb_plays >= 100) %>% 
  group_by(name_short) %>% 
  summarize(
    median = median(qbr_total), 
    years = range(season) %>% paste0(collapse = "-"),
    active = if_else(max(season) == 2020, "Active", "Retired"),
    .groups = "drop"
    ) %>% 
  arrange(desc(median))
nfl_qbr_season %>%
  group_by(name_short) %>% 
  mutate(median = median(qbr_total), q5 = quantile(qbr_total, 0.5)) %>% 
  filter(n() > 5) %>% 
  ungroup() %>% #select(median, q5)
  filter(median >= median(qbr_total)) %>% 
# %>% 
#   # mutate(qu = quantile(qbr_total))
#   filter(median <= quantile(qbr_total, 0.15) | median >= quantile(qbr_total, 0.75)) %>% 
  ggplot(aes(x = qbr_total, y = fct_reorder(short_name, qbr_total), color = season)) +
  geom_boxplot() +
  geom_jitter() +
  scale_color_viridis_c(direction = -1)


jthomasmock/espnscrapeR documentation built on Feb. 11, 2024, 4:07 p.m.