knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-",
  warning = FALSE,
  message = FALSE,
  eval = FALSE
)

npncaa

The npncaa package scrapes NCAA info from https://www.sports-reference.com/.

Installation

You can install npncaa from github with:

# install.packages("devtools")
devtools::install_github("nickpaul7/npcnaa")

Tournament Info

To collect data for a given year use the code below:

library(npncaa)

year <- 2019

df_bracket <- get_bracket_data(2019)

The get_bracket_data() function will return a data frame.

library(tidyverse)

glimpse(df)

Finally, you can get all data from 1985 to the present using the following code

library(tidyverse)
library(npncaa)

years <- c(1993:2019)

df_all_years <- purrr::map_df(years, get_bracket_data)

Team Stats by Year

# year <- 2019
# url <- create_team_stats_url(year)
# url_opponent_advanced <- create_team_stats_url(year, opponent = TRUE, advanced = TRUE)
# url

Get team stats for a particular year.

Not all years have all options, but the function will alert you when this happens.

# year <- 1955
# url_opponent_advanced <- create_team_stats_url(year, opponent = FALSE, advanced = FALSE)
# url_opponent_advanced
df_team_stats <- ncaa_team_stats(2019)
df_advanced <- ncaa_team_stats(2019, advanced = TRUE)
df_opponent <- ncaa_team_stats(2019, opponent = TRUE)
df_advanced_opponent <- ncaa_team_stats(2019, advanced = TRUE, opponent = TRUE)

When the data does not exist, the ncaa_team_stats() function will return an empty data frame. This facilitates pulling data over a period of years without knowing where the end is.

df_advanced_opponent <- ncaa_team_stats(1955, advanced = TRUE, opponent = TRUE)
years <- 1993:2019
df_team_stats_1993_2019 <- purrr::map_df(years, ncaa_team_stats)

Prepare for ML

df_ml <- df_all_years %>% 
    add_team_stats(df_team_stats_1993_2019) %>% 
    select_features() %>% 
    select(diff)


nickpaul7/npcnaa documentation built on March 6, 2020, 9:14 a.m.