This package includes special functions and datasets for a course on Exploratory Data Analysis based on John Tukey's EDA text.
It can be installed from Jim Albert's Github site:
library(remotes)
install_github("bayesball/LearnEDAfunctions")
The only prerequisite packages are the dplyr and ggplot2 packages.
The LearnEDAfunctions
package is loaded by the library() function:
library(LearnEDAfunctions)
Also there are several functions from other packages used in the EDA course. One will need to install these packages to use this functions.
act.scores.06.07
ACT scores of states in the US.baseball.attendance
Attendance at baseball teams.batting.history
Yearly Batting Statistics for Major League Baseballbeatles
Lengths of songs from Beatles' albums.boston.marathon
Boston Marathon completion times of women of different ages.boston.marathon.wtimes
Boston marathon winning timesbraves.attendance
Attendance at games of a baseball teamcar.measurements
Car measurements
-church.2way
Church attendance as a two-way table.church.tseries
Time series of worship attendancecollege.ratings
Ratings of National Universities in the U.S.farms
Number of farms in the states of the U.S.fit.gaussian
Fitting a Gaussian curve to binned datafootball
American football scoresgestation.periods
Gestation periods for different animals.grandma.19.40
Grandmas Marathon completion timesheaviest.fish
Fish world record catcheshome.prices
Home sales prices in the U.S.homeruns.2000
Team home run numbers for different seasons.homeruns.61
Home run counts in 1961immigrants
Immigrant counts to US.island.areas
Areas of islands from different continents.lake
Lake measurementsmortality.rates
Infant mortality rates of countriesolympics.run
Olympics running times.olympics.speed.skating
Olympics speed skating times.olympics.swim
Olympics swimming times.pitching.history
Yearly Pitching Statistics for Major League Baseballpop.change
Population change for all states in the U.S.pop.densities
Population densities of states for different years.pop.england
England populationrent.prices
Rent prices in different cities.salaries
Salaries of different professions in different cities.snowfall
Snowfall amounts of two citiesstudentdata
Student datasettemperatures
Temperatures for different cities.tukey.24a
Tukey straightening exercise 24atukey.24b
Tukey straightening exercise 24btukey.26a
Tukey straightening exercise 24atukey.26b
Tukey straightening exercise 24btukey.26c
Tukey straightening exercise 24cus.pop
Population of United StatesFits a Gaussian curve to binned data.
data <- rt(200,df=5)
bins <- pretty(range(data))
g.mean <- 0
g.sd <- 1
fit.gaussian(data, bins, g.mean, g.sd)
Half-slope ratio
sx <- c(10,30,50)
sy <- c(5,8,20)
half.slope.ratio(sx, sy, 1, 1)
half.slope.ratio(sx, sy, -0.5, -0.5)
Hanning a sequence.
plot(WWWusage)
plot(smooth(WWWusage, kind="3RSS"))
plot(han(smooth(WWWusage, kind="3RSS")))
Hinkley's quick method.
raw <- state.x77[, "Population"]
hinkley(raw)
logs <- log(raw)
hinkley(logs)
Finds outliers by group using Tukey's rule
lval_plus(beatles, time, album)
Letter values
lval(rnorm(100))
raw <- state.x77[,"Population"]
matched.roots <- mtrans(raw,0.5)
matched.logs <- mtrans(raw,0)
boxplot(data.frame(raw, matched.roots,
matched.logs))
Plot of an additive fit.
temps <- matrix(data=c(50, 30, 35, 21, 38,
73, 58, 65, 57, 63,
88, 83, 89, 84, 86,
73, 62, 68, 59, 66),
nrow=5,ncol=4,
dimnames=list(c("Atlanta",
"Detroit", "Kansas City",
"Minneapolis", "Philadelphia"),
c("January", "April",
"July", "October")))
fit <- medpolish(temps)
plot2way(fit$row + fit$overall, fit$col,
dimnames(temps)[[1]],
dimnames(temps)[[2]])
Power transformation.
power.t(c(3, 6, 5, 4, 7), 0.5)
Computation of a resistant line.
df <- data.frame(x = 1:10,
y = 3 * (1:10) + rnorm(10))
rline(y ~ x, df, iter=5)
Spread versus level plot.
spread_level_plot(beatles, time, album)
Symmetry plot.
symplot(rnorm(100))
# symmetry plot for exponential data
symplot(rexp(100))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.