knitr::opts_chunk$set(echo = TRUE) library(pkgapi) pkg <- pkgapi::map_package(path = "../") library(dplyr) exported <- pkg$defs %>% dplyr::filter(exported == TRUE) cfbd_funcs <- sum(stringr::str_detect(exported$file,"cfbd")) - sum(stringr::str_detect(exported$file,"cfbd_pbp_data")) pkg_name <- "sportsdataverse/cfbfastR" url <- paste0("https://raw.githubusercontent.com/", pkg_name, "/main/DESCRIPTION") x <- readLines(url) remote_version <- gsub("Version:\\s*", "", x[grep('Version:', x)])
Hey folks,
Welcome to the football analytics community! I'm Saiem Gilani, one of the authors of cfbfastR
, and I hope to give the community a high-quality resource for accessing college football data for statistical analysis, football research, and more. I am excited to show you some of what you can do with this edition of the package.
Select the appropriate link for your operating system (Windows, Mac OS X, or Linux)
Windows - Select base and download the most recent version
Linux - Select the appropriate distro and follow the installation instructions
Head to RStudio.com
if (!requireNamespace('pacman', quietly = TRUE)){ install.packages('pacman') } pacman::p_load(tidyverse, cfbfastR, zoo, ggimage, gt)
There are generally speaking three college football data sources accessed from this package:
Functions that use the cfbfastR-data
repository will contain _cfb
or cfb_
in the function name and would be considered loading functions for the play-by-play data.
Functions that use the CFB Data API start with cfbd_
by convention and should be assumed as get
functions.
Functions that use one of ESPN's APIs start with espn_
by convention and should be assumed as get
functions. There are only two of these functions so far: espn_ratings_fpi()
and espn_metrics_wp()
However, there is only one data provider involved for most game data, ESPN's data provider.
As of cfbfastR
version r remote_version
, the package exports r nrow(exported)
functions. The bulk (\~r cfbd_funcs
) of the functions within the package serve as the unofficial R API client for the College Football Data API.
Since April 1, 2021, the College Football Data API requires key authentication, but the key is free to acquire and use.
Follow the instructions and wait for your API key to be delivered to the e-mail account associated with your key.
You can save the key for consistent usage by adding CFBD_API_KEY=YOUR-API-KEY-HERE
to your .Renviron file (easily accessed via usethis::edit_r_environ()
). Run usethis::edit_r_environ()
, a new script will pop open named .Renviron
, THEN paste the following in the new script that pops up (without quotations)
CFBD_API_KEY = YOUR-API-KEY-HERE
Save the script and restart your RStudio session, by clicking Session
(in between Plots
and Build
) and click Restart R
(n.b. there also exists the shortcut Ctrl + Shift + F10
to restart your session). If set correctly, from then on you should be able to use any of the cfbd_
functions without any other changes.
For less consistent usage, save your API key as the environment variable CFBD_API_KEY
(with quotations) at the beginning of every session, using a command like the following.
Sys.setenv(CFBD_API_KEY = "YOUR-API-KEY-HERE")
If you have ever worked with the now archived cfbscrapR
package, most of the functions in cfbfastR
should be fairly familiar with some slight changes.
cfbfastR::cfbd_pbp_data()
(1 season, \~6-7 minutes r emo::ji("confused_face")
)cfbscrapR::cfb_pbp_data()
(1 season, \~8-10 minutes r emo::ji("old_man")
)cfbfastR::load_cfb_pbp()
(7+ seasons, \~1-1.5 minutes r emo::ji("flame")
)We are going to load in data for seasons 2014-r cfbfastR:::most_recent_cfb_season()
, it'll take between 45-90 seconds to run.
tictoc::tic() pbp <- data.frame() seasons <- 2014:cfbfastR:::most_recent_cfb_season() progressr::with_progress({ pbp <- cfbfastR::load_cfb_pbp(seasons) }) tictoc::toc()
In the selected seasons, there are r length(unique(pbp$game_id))
games for which the data repository has play by play data. In the present term, the data repository supplies over a million rows of play by play data with r ncol(pbp)
columns of data. The most relevant play columns are kept to the left of the data frame for clarity, let's take a look at the first 40 or so.
glimpse(pbp[1:40])
So there are three basic ids within each game,
game_id
),drive_id
),id_play
or play_id
depending on which data set you are looking at).These are useful for all kinds of grouping, joining and sorting tasks. The columns pos_team
and def_pos_team
are essentially your offense and defense (the main difference is kickoffs, the team receiving the kickoff is the pos_team
) for the play/drive. From there you have the typical descriptions, play types and yardage columns. Beyond that, you will see the origin of why this package came to be, building expected points and win probability metrics for in-game valuation of plays.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.