knitr::opts_chunk$set(echo = TRUE)

an R package for downloading MLB statcast data

statcastr was created as a component of my application for the position of Junior Quantitative Analyst in the Baseball Research and Development Department of the Los Angeles Dodgers. My intention with the creation of this package is to highlight select skills most relevant for the job. My extensive academic research experience is highlighted here under the CV tab. Thank you for considering my application.

quick start

Install statcastr with
Documentation is available for exported functions with e.g. ?fetch_statcast

To download statcast data for a given day or dates, use fetch_statcast(). The dates argument is designed to be as flexible as possible. A single date of the form yyyy-mm-dd may be used. A season's worth of data can be retrieved by entering a year. A range of dates can be specified with the separator " to ".

library(statcastr)
sc <- fetch_statcast('2017 April 12 to 2017 April 14', verbose=F)
sc
# sc <- fetch_statcast(c(20170412,'2017Apr13 to 20170414')) # same as above, flexibility with dates argument

fetch_statcast() is agnostic to the MLB schedule. By default, it first generates a server query for the number of pitches thrown on a given date, and then downloads data corresponding to days with at least 1 pitch thrown. The purpose of this functionality is twofold: 1. the user does not need to know the MLB schedule ahead of time, and 2. it serves as an extra quality check, as the final number of fetched pitches.

-> Peak at the data with d(sc), or get a summary with summary(sc)

Tips

-> Save data locally for quick re-loading in the future: save_statcast(sc, output="example.RDS")

-> To export in human-readable format, specify .csv as the file extension. If no file extension is included, data will be saved as an .RDS file.

-> A statcast data object can be re-loaded at a later time: sc <- load_statcast("example.RDS"). This function will also recognize .csv format.

-> Additional or new data can be added to an existing statcast object with:

sc <- update_statcast(sc, dates=20170416, verbose=F)

-> For convenience, data can alwawys be sorted in chronological order with sort_statcast()

-> Objects of class statcast inherit from data.table-class, and can therefore be interacted with as such

boxplot(sc[player_name=='Clayton Kershaw' & pitch_type=='SL']$release_speed,ylab='MPH')


devin-AK/statcastr documentation built on Nov. 18, 2022, 1:48 p.m.