
Inspired by the NCAA data extraction functions from the
{baseballr} package, the
goal of
{ncaavolleyballr}
is to extract women's and men's volleyball information from the NCAA
website. The functions in this package can extract team
records/schedules and player statistics for the 2020-2025 NCAA women's
and men's divisions I, II, and III volleyball teams. Functions can
aggregate statistics for teams, conferences, divisions, or custom groups
of teams.
You can install the stable released version of flashr from CRAN with:
install.packages("ncaavolleyballr")
You can install developmental versions from GitHub with:
# install.packages("remotes")
remotes::install_github("JeffreyRStevens/ncaavolleyballr")
library(ncaavolleyballr)
A suite of functions can be used to extract season, match, and play-by-play data for teams and players. See the Getting Started vignette for a more thorough description of the functions.
Note that changes to the NCAA stats website has greatly slowed data scraping and has resulted in unstable connections. When encountering errors, try running the function again or at a later time.
The NCAA uses a unique team ID for each women's and men's volleyball
team and season. So to access a team's season data, first you will need
to get that ID with the find_team_id(). For instance, to find the ID
for Penn State's 2024 season:
find_team_id("Penn St.", 2024)
#> [1] "585406"
With this team ID, you can now extract overall season performance data
for the team's players with the player_season_stats().
find_team_id("Penn St.", 2024) |>
player_season_stats()
The NCAA also uses a unique contest ID for each women's and men's
volleyball match. The easiest way to get that ID is with
find_team_contest(), which returns the contest ID for all matches in a
particular season (using the Team ID provided by find_team_id()). For
instance, to find the contest ID for 2024 National Championship match
between Louisville and Penn State:
find_team_id("Penn St.", 2024) |>
find_team_contests() |>
tail()
From that, we can see that the contest ID is 6080706. If we pass this
contest ID to the player_match_stats() function, we'll get a list with
two data frames (one for each team in the contest) that contain player
statistics for the match. If we want to get just the Penn State player
data, we can set team = "Penn St.".
player_match_stats(contest = "6080706", team = "Penn St.")
Play-by-play data are also available with match_pbp(). This returns a
data frame with all events and players.
match_pbp(contest = "6080706") |>
head(10)
By default, these functions return information on women's teams, but
they can be set to return men's information by setting sport = "MVB".
You can also aggregate data across conferences, divisions, or custom
groups with conference_stats(), division_stats(), and
group_stats().
Scraping large amounts of data from the NCAA stats site can take a long time and is prone to unstable connections. Accessing the website too frequently or with multiple functions simultaneously can result in your IP address being blocked. To get around this issue, I have scraped all data from 2020-2024 and have posted it on the data page.
To cite
{ncaavolleyballr},
use:
Stevens JR (2026). Extract Data from NCAA Women’s and Men’s Volleyball Website. R package version 0.5.1, https://github.com/JeffreyRStevens/ncaavolleyballr.
Many thanks to Bill Petti for making the
code for NCAA stats extraction freely available in the
{baseballr} package. And
thank you to Tyler Widdison for
inspiring me to extract the play-by-play
data
(check out his
{ncaavolleyballR}
package for some similar functionality). Code from
{baseballr} and
{rvest} (both licensed under an MIT
license) have been incorporated and modified in this package.
The volleyball background in the logo was designed by Freepik.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.