Description Usage Arguments Value Note See Also Examples
Function for obtaining PITCHf/x and other related Gameday Data. scrape
currently has support for files ending with:
inning/inning_all.xml,
inning/inning_hit.xml,
players.xml, or
miniscoreboard.xml.
It's worth noting that PITCHf/x is contained in files ending with "inning/inning_all.xml", but the other files can complement this data depending on the goal for analysis.
Any collection of file names may be passed to the suffix
argument, and scrape
will retrieve data from a (possibly large number)
of files based on either a window of dates or a set of game.ids
.
If collecting data in bulk, it is strongly recommended that one establishes a database connection and supplies the
connection to the connect
argument. See the examples section for a simple example of how to do so.
1 |
start |
character string specifying a date "yyyy-mm-dd" to commence scraping. |
end |
character string specifying a date "yyyy-mm-dd" to terminate scraping. |
game.ids |
character vector of gameday_links. If this option is used, |
suffix |
character vector with suffix of the XML files to be parsed. Currently supported options are: 'players.xml', 'miniscoreboard.xml', 'inning/inning_all.xml', 'inning/inning_hit.xml'. |
connect |
A database connection object. The class of the object should be "MySQLConnection" or "SQLiteConnection". If a valid connection is supplied, tables will be copied to the database, which will result in better memory management. If a connection is supplied, but the connection fails for some reason, csv files will be written to the working directory. |
... |
arguments passed onto |
Returns a list of data frames (or nothing if writing to a database).
This function was adapted from scrapeFX
which is deprecated as of version 1.0
If you want to add support for more file types, the XML2R
package is a good place to start.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | ## Not run:
# Collect PITCHf/x (and other data from inning_all.xml files) from
# all games played on August 1st, 2013 (using asynchronous downloads)
dat <- scrape(start = "2013-08-01", end = "2013-08-01")
#As of XML2R 0.0.5, asyncronous downloads can be performed
dat <- scrape(start = "2013-08-01", end = "2013-08-01", async = TRUE)
# Scrape PITCHf/x from Minnesota Twins 2011 season
data(gids, package = "pitchRx")
twins11 <- gids[grepl("min", gids) & grepl("2011", gids)]
dat <- scrape(game.ids = twins11[1]) #scrapes 1st game only
data(nonMLBgids, package = "pitchRx")
# Grab IDs for triple A games on June 1st, 2011
# This post explains more about obtaining game IDs with regular expressions --
# http://baseballwithr.wordpress.com/2014/06/30/pitchrx-meet-openwar-4/
aaa <- nonMLBgids[grepl("2011_06_01_[a-z]{3}aaa_[a-z]{3}aaa", nonMLBgids)]
dat <- scrape(game.ids = aaa)
# Create SQLite database, then collect and store data in that database
library(dplyr)
my_db <- src_sqlite("Gameday.sqlite3")
scrape(start = "2013-08-01", end = "2013-08-01", connect = my_db$con)
# Collect other data complementary to PITCHf/x and store in database
files <- c("inning/inning_hit.xml", "miniscoreboard.xml", "players.xml")
scrape(start = "2013-08-01", end = "2013-08-01", connect=my_db$con, suffix = files)
# Simple example to demonstrate database query using dplyr
# Note that 'num' and 'gameday_link' together make a key that allows us to join these tables
locations <- select(tbl(my_db, "pitch"), px, pz, des, num, gameday_link)
names <- select(tbl(my_db, "atbat"), pitcher_name, batter_name, num, gameday_link)
que <- inner_join(locations, filter(names, batter_name == "Paul Goldschmidt"),
by = c("num", "gameday_link"))
que$query #refine sql query if you'd like
pitchfx <- collect(que) #submit query and bring data into R
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.