knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "README-" )
This package is designated for all NBA enthusiasts! The rsketball
package works to scrape online tabular data from the ESPN NBA website into a csv file. It also includes various functions to create graphs and statistical analysis for your interest (such as boxplots, player rankings by stats, and a summary statistics table).
An example of the ESPN NBA 2018/19 Regular season player stats can be found in the following url:
https://www.espn.com/nba/stats/player/_/season/2019/seasontype/2
nba_scraper
nba_boxplot
nba_ranking
nba_team_stats
nba_scraper()
The rsketball::nba_scraper
is based on Selenium (or specifically RSelenium) which enables automated web browsing through "drivers". To use it, please ensure that Docker
is installed.
For installation instructions, please follow the guide to Docker installation based on your OS type. Docker will be used to pull the relevant Chromedriver image that when executed as containers, will serve as the "driver" for Selenium.
The following steps are required only for the nba_scraper
function. If you already have the scraped data file and wish to use the other functions (nba_boxplot
, nba_rank
, nbastats
), there is no need to proceed with these steps.
Step 1 (Command line/Terminal): Preparation Step (Docker container)
Pull docker image with the following code in Terminal. We will stick to Chrome since it seems compatible with Windows while Firefox is not.
docker pull selenium/standalone-chrome
Critical step about setting ports and memory allocation:
We need to set up the Docker container default port 4444 to our computer host port 4445. Keep this port number as inputs for the nba_scraper
function. We will also allocate the virtual memory of the container to 2Gb for it to scrape effectively.
Run the following code in Terminal:
docker run -d -p 4445:4444 --shm-size 2g selenium/standalone-chrome
Verify that the docker container is in operation by running the following code in Terminal:
docker ps
Step 2 (R/RStudio): Scraping with nba_scraper
Now that the container is running with the allocated memory and assigned port, we can proceed with testing
library(rsketball) # Scrape postseason season 2017/18 while saving to a local csv file. nba_2017_playoffs <- nba_scraper(season_year = 2017, season_type = "postseason", port=4445L, csv_path = "nba_2017_playoffs.csv")
If everything was executed as intended, you should obtain a csv file called "nba_2017_playoffs.csv" containing the scraped data, and a tibble in your R environment named "nba_2017_playoffs". With the tibble, you can use the other rsketball
functions for your analysis.
Step 3 (Command line/Terminal): Termination of Docker Container
After test scraping is completed, we can shut down the Docker Container instance. This will also ensure that your computer memory/resources are restored.
docker stop $(docker ps -q)
If you wish to, you can also remove the Docker image from your computer, where "
docker image rm <image_id>
To do testing of the package functions, please refer to the instructions found in the README.md located at the testing subdirectory folder.
This rsketball
package aims to further gain understanding of ESPN NBA data and does not have a specific fit to the R ecosystem. There are currently some other library packages such as nbastatR
that take data from other sources (NBA Stats API, Basketball Insiders, Basketball-Reference, HoopsHype, and RealGM), but no package that we currently know of takes data from ESPN NBA specifically.
rsketball
is still in project development. We estimate that by end March 2020, one can install the released version of rsketball
from CRAN.
Package installation in R:
install.packages("rsketball")
And the development version from Github with:
install.packages("devtools") devtools::install_github("UBC-MDS/rsketball")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.