Small datasets for use in rstanarm examples and vignettes.
bball1970
Data on hits and atbats from the 1970 Major League Baseball season for 18 players.
Source: Efron and Morris (1975).
18 obs. of 5 variables
Player
Player's last name
Hits
Number of hits in the first 45 atbats of the season
AB
Number of atbats (45 for all players)
RemainingAB
Number of remaining atbats (different for most players)
RemainingHits
Number of remaining hits
bball2006
Hits and atbats for the entire 2006 American League season of Major League Baseball.
Source: Carpenter (2009)
302 obs. of 2 variables
y
Number of hits
K
Number of atbats
kidiq
Data from a survey of adult American women and their children (a subsample from the National Longitudinal Survey of Youth).
Source: Gelman and Hill (2007)
434 obs. of 4 variables
kid_score
Child's IQ score
mom_hs
Indicator for whether the mother has a high school degree
mom_iq
Mother's IQ score
mom_age
Mother's age
mortality
Surgical mortality rates in 12 hospitals performing cardiac surgery in babies.
Source: Spiegelhalter et al. (1996).
12 obs. of 2 variables
y
Number of deaths
K
Number of surgeries
radon
Data on radon levels in houses in the state of Minnesota.
Source: Gelman and Hill (2007)
919 obs. of 4 variables
log_radon
Radon measurement from the house (log scale)
log_uranium
Uranium level in the county (log scale)
floor
Indicator for radon measurement made on the first floor of
the house (0 = basement, 1 = first floor)
county
County name (factor
)
roaches
Data on the efficacy of a pest management system at reducing the number of roaches in urban apartments.
Source: Gelman and Hill (2007)
262 obs. of 6 variables
y
Number of roaches caught
roach1
Pretreatment number of roaches
treatment
Treatment indicator
senior
Indicator for only eldery residents in building
exposure2
Number of days for which the roach traps were used
tumors
Tarone (1982) provides a data set of tumor incidence in historical control groups of rats; specifically endometrial stromal polyps in female lab rats of type F344.
Source: Gelman and Hill (2007)
71 obs. of 2 variables
y
Number of rats with tumors
K
Number of rats
wells
A survey of 3200 residents in a small area of Bangladesh suffering from arsenic contamination of groundwater. Respondents with elevated arsenic levels in their wells had been encouraged to switch their water source to a safe public or private well in the nearby area and the survey was conducted several years later to learn which of the affected residents had switched wells.
Souce: Gelman and Hill (2007)
3020 obs. of 5 variables
switch
Indicator for wellswitching
arsenic
Arsenic level in respondent's well
dist
Distance (meters) from the respondent's house to the
nearest well with safe drinking water.
association
Indicator for member(s) of household participate
in community organizations
educ
Years of education (head of household)
1 2 3 4 5 6 7 8 9 10  # Using 'kidiq' dataset
fit < stan_lm(kid_score ~ mom_hs * mom_iq, data = kidiq,
prior = R2(location = 0.30, what = "mean"),
# the next line is only to make the example go fast enough
chains = 2, iter = 500, seed = 12345)
pp_check(fit, nreps = 20)
bayesplot::color_scheme_set("brightblue")
pp_check(fit, plotfun = "stat_grouped", stat = "median",
group = factor(kidiq$mom_hs, labels = c("No HS", "HS")))

