knitr::opts_chunk$set(echo = TRUE)
Here's an outline of how to download the Retrosheet play files for a particular season, and also how to compute the runs values for all plays.
There is a blog posting on this at https://baseballwithr.wordpress.com/2014/02/10/downloading-retrosheet-data-and-runs-expectancy/
Download the Chadwick files following the advice from the website mentioned on the blog post -- this will parse the original source files.
Set up the current working directory to have a folder called "download.folder". Inside that folder, place two subfolders "unzipped" and "zipped". The following system commands will accomplish this.
system("mkdir download.folder") setwd("download.folder") system("mkdir unzipped") system("mkdir zipped") setwd("..")
library(PitchSequence)
parse.retrosheet2.pbp()
function to download the Retrosheet play-by-play data for the 2020 season.parse.retrosheet2.pbp(2020)
Navigate to the download file and check that three files are there.
setwd("download.folder/unzipped") dir()
compute.runs.expectancy()
, saving the result in d2020.d2020 <- compute.runs.expectancy(2020)
Display the starting state, the new state and the runs value for the first few plays:
library(tidyverse) d2020 %>% select(STATE, NEW.STATE, RUNS.VALUE) %>% head()
Display the runs expectancies for all states:
d2020 %>% group_by(STATE) %>% summarize(R = first(RUNS.STATE))
Show the runs values for all possible transitions between states:
d2020 %>% group_by(STATE, NEW.STATE) %>% summarize(R = first(RUNS.VALUE))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.