NCAA Scraping"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

The latest release of the baseballr includes a function for acquiring player statistics from the NCAA's website for baseball teams across the three major divisions (I, II, III).

In order to look up teams, you can either load the teams for all divisions from the baseballr-data repository or access them directly from the NCAA website for a given year and division.

Loading from the baseballr-data repository:

library(baseballr)
library(dplyr)
ncaa_teams_df <- load_ncaa_baseball_teams()

From the NCAA website:

try(ncaa_teams(year = most_recent_ncaa_baseball_season(), division = "1"))

The function, ncaa_team_player_stats(), requires the user to pass values for three parameters for the function to work:

team_id: numerical code used by the NCAA for each school year: a four-digit year type: whether to pull data for batters or pitchers

If you want to pull batting statistics for Florida State for the r baseballr::most_recent_ncaa_baseball_season() season, you would use the following:

team_id <- ncaa_teams_df %>% 
  dplyr::filter(.data$team_name == "Florida St.") %>% 
  dplyr::select("team_id") %>% 
  dplyr::distinct() %>% 
  dplyr::pull("team_id")

year <- most_recent_ncaa_baseball_season()

ncaa_team_player_stats(team_id = team_id, year = year, "batting")

The same can be done for pitching, just by changing the type parameter:

ncaa_team_player_stats(team_id = team_id, year = year,  "pitching")

Now, the function is dependent on the user knowing the team_id used by the NCAA website. Given that, I've included a ncaa_school_id_lu function so that users can find the team_id they need.

Just pass a string to the function and it will return possible matches based on the school's name:

ncaa_school_id_lu("Vand")


Try the baseballr package in your browser

Any scripts or data that you put into this service are public.

baseballr documentation built on April 1, 2023, 12:12 a.m.