This package is designated for all NBA enthusiasts! The rsketball
package works to scrape online tabular data from the ESPN NBA website
into a csv file. It also includes various functions to create graphs and
statistical analysis for your interest (such as boxplots, player
rankings by stats, and a summary statistics table).
An example of the ESPN NBA 2018/19 Regular season player stats can be found in this EPSN NBA url
This project is proudly created by:
nba_scraper
nba_boxplot
nba_ranking
nba_team_stats
For a more detailed understanding of the functions and their use cases, please refer to the package vignette.
rsketball
is still in project development. We estimate that by end
March 2020, one can install the released version of rsketball
from
CRAN.
Package installation in R:
install.packages("rsketball")
And the development version from Github with:
install.packages("devtools")
devtools::install_github("UBC-MDS/rsketball")
nba_scraper()
The rsketball::nba_scraper
is based on Selenium (or specifically
RSelenium) which enables automated web browsing through “drivers”. To
use it, please ensure that Docker
is installed.
For installation instructions, please follow the guide to Docker installation based on your OS type. Docker will be used to pull the relevant Chromedriver image that when executed as containers, will serve as the “driver” for Selenium.
The following steps are required only for the nba_scraper
function. If
you already have the scraped data file and wish to use the other
functions (nba_boxplot
, nba_rank
, nbastats
), there is no need to
proceed with these steps.
Step 1 (Command line/Terminal): Preparation of Docker container
Pull the docker image with the following code in Terminal. We will stick to Chrome since it seems compatible with Windows while Firefox is not.
docker pull selenium/standalone-chrome
Critical step about setting ports and memory allocation:
We need to set up the Docker container default port 4444 to our computer
host port 4445. Keep this port number as inputs for the nba_scraper
function. We will also allocate 2GB of virtual memory for the container
to scrape effectively.
Run the following code in Terminal:
docker run -d -p 4445:4444 --shm-size 2g selenium/standalone-chrome
Verify that the docker container is in operation by running the following code in Terminal:
docker ps
Step 2 (R/RStudio): Scraping with nba_scraper
Now that the container is running with the allocated memory and assigned port, we can proceed with testing
library(rsketball)
# Scrape postseason season 2017/18 while saving to a local csv file.
nba_2017_playoffs <- nba_scraper(season_year = 2017,
season_type = "postseason",
port=4445L, # Port number as per Docker container setup
csv_path = "nba_2017_playoffs.csv")
If everything was executed as intended, you should obtain a csv file
called “nba_2017_playoffs.csv” containing the scraped data, and a
tibble in your R environment named “nba_2017_playoffs”. With the
tibble, you can use the other rsketball
functions for your analysis.
Step 3 (Command line/Terminal): Termination of Docker Container
After test scraping is completed, we can shut down the Docker Container instance. This will also ensure that your computer memory/resources are restored.
docker stop $(docker ps -q)
If you wish to, you can also remove the Docker image from your computer, where “” represents the id of your Docker image.
docker image rm <image_id>
Once the Docker steps are setup with the relevant container running as mentioned in the Preparation steps above, you can start up R.
nba_scraper()
To load the package:
library(rsketball)
nba_scraper()
will help you create the dataframe of the NBA season of
interest to conduct further analysis using the functions below. The
following examples is for scraping the playoffs (postseason) season in
2017/18 while saving to a local csv file.
nba_2017 <- nba_scraper(2017, season_type = "postseason",
port=4445L, # Port number as per Docker container setup
csv_path = "nba_2017_playoffs.csv")
Effective usage of the rest of the functions in rsketball
may require
certain knowledge of the available columns in the scraped data. For more
context on the column names of the scraped data set, please refer to the
dataset description
file.
This will help the user better understand what columns are included in
the scraped data, as well as what they represent.
For the illustration of the other functions, let’s create a toy dataset with similar properties as the scraped data from ESPN NBA.
nba_data <- tibble::tibble(NAME = c("James", "Steph", "Bosh", "Klay", "Kobe"),
TEAM = c("MIA","GS","MIA","GS","LAL"),
POS = c("SF", "PG", "C", "SG", "SG"),
PTS = c(5,4,3,2,10),
TO = c(1,2,3,4,3),
`3PA` = c(10, 20, 30, 40, 50),
`FT%` = c(50, 60, 70, 80, 90))
nba_boxplot()
To further compare the different statistics (scoring, steals, rebounds,
etc) across different teams in combination with different player
positions, you can use nba_boxplot()
.
To look at the distribution of Free Throws Percentage or ‘FT%’ (which is a numerical column) for specific teams (must pass in a list).
Important: Since the column id (FT%) has a “%” character in it, we must ensure that the input for stats_column is formatted with backticks as shown:
nba_boxplot(nba_data,
team_or_position= "team",
grouping_list = c("MIA", "GS"),
stats_column = `FT%`) # Formatted with backticks.
nba_ranking()
The nba_ranking()
function creates a visualization showing the
rankings of a category with a statistic of interest.
In the this example, we rank the top 3 players (NAME) based on their number of Three Points Attempts (3PA) made in a descending order.
Important: Since the column id starts with a number 3, we must ensure that the input for stats_column is formatted with backticks as shown:
# Find top 3 players for 3 Point Attempts (3PA) where higher is better
nba_ranking(nba_data,
column = NAME,
by = `3PA`, # Formatted with backticks.
top = 3,
descending = TRUE,
FUN = mean)
nba_team_stats()
The nba_team_stats()
function finds statistics of mean, median, 25%,
and 75% quantiles. This function is primarily focused on team, and
allows for further grouping by player position per team.
In this example, we obtain the descriptive statistics of relevant numeric columns (PTS and TO) for specific teams (GS and LAL) with added grouping of their player positions (C and PG).
# Find specific stats (PTS, TO) for specific teams (GS, LAL) for specific positions (PG, SG)
nba_team_stats(nba_data,
stats_filter = c("PTS","TO"),
teams_filter = c("GS","LAL"),
positions_filter = c("SG","PG"))
#> # A tibble: 3 x 10
#> # Groups: TEAM [2]
#> TEAM POS PTS_mean TO_mean PTS_median TO_median PTS_quantile_25
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 GS PG 4 2 4 2 4
#> 2 GS SG 2 4 2 4 2
#> 3 LAL SG 10 3 10 3 10
#> # … with 3 more variables: TO_quantile_25 <dbl>, PTS_quantile_75 <dbl>,
#> # TO_quantile_75 <dbl>
For a more detailed understanding of the functions and their use cases, please refer to the package vignette.
To do testing of the package functions, please refer to the instructions found in the README.md located at the testing subdirectory folder.
dplyr>=0.8.3
forcats>=0.5.0
ggplot2>=3.2.1
magrittr>=1.5
readr>=1.3.1
rlang>=0.4.2
rvest>=0.3.4
scales>=1.1.0
tibble>=2.1.3
RSelenium>=1.7.7
This rsketball
package aims to further gain understanding of ESPN NBA
data and does not have a specific fit to the R ecosystem. There are
currently some other library packages such as
nbastatR
that take data from other sources (NBA Stats API, Basketball Insiders,
Basketball-Reference, HoopsHype, and RealGM), but no package that we
currently know of takes data from ESPN NBA specifically.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.