Description Usage Format Source Examples
Pitching table
1 |
A data frame with 42583 observations on the following 30 variables.
playerIDPlayer ID code
yearIDYear
stintplayer's stint (order of appearances within a season)
teamIDTeam; a factor
lgIDLeague; a factor with levels AA AL FL NL PL UA
WWins
LLosses
GGames
GSGames Started
CGComplete Games
SHOShutouts
SVSaves
IPoutsOuts Pitched (innings pitched x 3)
HHits
EREarned Runs
HRHomeruns
BBWalks
SOStrikeouts
BAOppOpponent's Batting Average
ERAEarned Run Average
IBBIntentional Walks
WPWild Pitches
HBPBatters Hit By Pitch
BKBalks
BFPBatters faced by Pitcher
GFGames Finished
RRuns Allowed
SHSacrifices by opposing batters
SFSacrifice flies by opposing batters
GIDPGrounded into double plays by opposing batter
Lahman, S. (2014) Lahman's Baseball Database, 1871-2013, 2014 version, http://baseball1.com/statistics/
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | # Pitching data
require(plyr)
###################################
# cleanup, and add some other stats
###################################
# Restrict to AL and NL data, 1901+
# All data re SH, SF and GIDP are missing, so remove
# Intentional walks (IBB) not recorded until 1955
pitching <- subset(Pitching, yearID >= 1901 & lgID %in% c("AL", "NL"))[, -(28:30)]
# Approximate missing BAOpp values (most common remaining missing value)
pitching$BAOpp <- with(pitching, round(H/(BFP - BB - HBP), 3))
# Compute WHIP (hits + walks per inning pitched -- lower is better)
pitching <- mutate(pitching,
WHIP = round((H + BB) * 3/IPouts, 2),
KperBB = round(ifelse(yearID >= 1955,
SO/(BB - IBB), SO/BB), 2))
#####################
# some simple queries
#####################
# Team pitching statistics, Toronto Blue Jays, 1993
tor93 <- subset(pitching, yearID == 1993 & teamID == "TOR")
arrange(tor93, ERA)
# Career pitching statistics, Greg Maddux
subset(pitching, playerID == "maddugr01")
# Best ERAs for starting pitchers post WWII
postwar <- subset(pitching, yearID >= 1946 & IPouts >= 600)
head(arrange(postwar, ERA), 10)
# Best K/BB ratios post-1955 among starters (excludes intentional walks)
post55 <- subset(pitching, yearID >= 1955 & IPouts >= 600)
post55 <- mutate(post55, KperBB = SO/(BB - IBB))
head(arrange(post55, desc(KperBB)), 10)
# Best K/BB ratios among relievers post-1950 (min. 20 saves)
head(arrange(subset(pitching, yearID >= 1950 & SV >= 20), desc(KperBB)), 10)
###############################################
# Winningest pitchers in each league each year:
###############################################
# Add name & throws information:
masterInfo <- Master[, c('playerID',
'nameLast', 'nameFirst', 'throws')]
pitching <- merge(pitching, masterInfo, all.x=TRUE)
wp <- ddply(pitching, .(yearID, lgID), subset, W == max(W),
select = c("playerID", "teamID", "W", "throws"))
anova(lm(formula = W ~ yearID + I(yearID^2) + lgID + throws, data = wp))
# an eye-catching, but naive, specious graph
require('ggplot2')
# compare loess smooth with quadratic fit
ggplot(wp, aes(x = yearID, y = W)) +
geom_point(aes(colour = throws, shape=lgID), size = 2) +
geom_smooth(method="loess", size=1.5, color="blue") +
geom_smooth(method = "lm", se=FALSE, color="black", formula = y ~ poly(x,2)) +
ylab("Maximum Wins") + xlab("Year") +
ggtitle("Why can't pitchers win 30+ games any more?")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.