knitr::opts_chunk$set(comment = "", prompt = TRUE, collapse = TRUE) #devtools::load_all()
The main purpose of this vignette is to provide R code to produce graphs that involve more than one variable. We consider two general situations: (i) plotting the values of a continuous variable for different values of a categorical variable (see The Oxford Birth Times data) and (ii) scatter plots of one variable against another (see The 2000 US Presidential Election data). The latter feature in Section 2.6 of the STAT0002 notes. See also the Chapter 2: Graphs (one variable).
The R code used in this vignette are available: graphs-2-vignette.R.
These data are available in the data frame ox_births
.
library(stat1004) birth_times <- ox_births[, "time"] day <- ox_births[, "day"]
To display these data we manipulate them into a matrix that is of the same format as Table 2.1 in the notes. The number of birth times varies between days so we pad the matrix with R's missing values code NA
in order that each column of the matrix has the same number of rows.
ox_mat <- matrix(NA, ncol = 7, nrow = 16) for (i in 1:7) { day_i_times <- ox_births$time[which(ox_births$day == i)] ox_mat[1:length(day_i_times), i] <- sort(day_i_times) colnames(ox_mat) <- paste("day", 1:7, sep = "") } ox_mat
We have a numeric continuous variable, birth_times
, and a categorical variable, day
. The following code produces the graphs in the STAT0002 lecture slides that contain a plot of birth_times for each day of the week.
par(mar = c(4, 4, 0.5, 1)) xlab <- "time (hours)" x_labs <- c(min(birth_times), pretty(birth_times), max(birth_times)) # top left box_plot(birth_times ~ day, col = 8, ylab = xlab, pch = 16, xlab = "day") # top right box_plot(birth_times ~ day, col = 8, horizontal = TRUE, axes = FALSE, xlab = xlab, ylab = "day", pch = 16) axis(1, at = x_labs, labels = x_labs) axis(2, at = 1:7, labels = 1:7, lwd = 0, lty = 0) # bottom left box_plot(birth_times ~ day, axes = FALSE, ylab = xlab, pch = 16, lty = 1, range = 0, boxcol = "white", staplewex = 0, medlty = "blank", medpch = 16, xlab = "day") axis(1, at = 1:7, labels = 1:7, lwd = 0, lty = 0) axis(2, at = x_labs, labels = x_labs) # bottom right box_plot(birth_times ~ day, horizontal = TRUE, axes = FALSE, xlab = xlab, pch = 16, lty = 1, range = 0, boxcol = "white", staplewex = 0, medlty = "blank", medpch = 16) axis(1, at = x_labs, labels = x_labs) axis(2, at = 1:7, labels = 1:7, lwd = 0, lty = 0, las = 1)
These data are available in the data frame USelection
. See ?USelection
for details.
# County identifiers and location head(USelection[, 1:4]) # County demographic variables head(USelection[, 5:12]) # Numbers of votes for candidates head(USelection[, 13:22])
For the moment we simply produce some scatter plots. A separate vignette will be devoted to these data.
A plot to show the locations of the counties.
plot(-USelection[, "lon"], USelection[, "lat"], xlab = "longitude (degrees north)", ylab = "latitude (degrees east)", pch = 16)
A plot of the percentage of the vote for Buchanan against population size.
pbuch <- 100 * USelection$buch/USelection$tvot is_PB <- USelection[, "co_names"] == "PalmBeach" pch <- 1 + 3 * is_PB pch plot(USelection$npop, pbuch, xlab = "population", ylab = "Buchanan % vote", pch = pch) which_PB <- which(is_PB) text(USelection[which_PB, "npop"], pbuch[which_PB] + 0.1, "Palm Beach", cex = 0.8)
Pairwise scatter plots of the demographic variables.
pairs(USelection[, 5:12])
A plot of the square root of the percentage of the vote for Buchanan against population size, in thousands of people. The horizontal axis has been plotted on a log scale.
x <- USelection$npop / 1000 y <- sqrt(pbuch) ystring <- expression(sqrt("% Buchanan vote")) rm_PB <- which(!is_PB) scatter(x[rm_PB], y[rm_PB], pch = 16, xlab ="Total Population (1000s)", ylab = ystring, log = "x") points(x[which_PB], y[which_PB], pch = "X") text(x[which_PB], y[which_PB] + 0.04, "Palm Beach", cex = 0.8)
Can you see how the different method to identify Palm Beach works?
Can you guess what the numbers on the axes are? See scatter
to find out.
Similarly, we use scatter_hist
to create a scatter plot in which the distribution of each variable is summarized by a histogram.
scatter_hist(x, y, log = "x", pch = 16, xlab ="Total Population (1000s)", ylab = ystring)
The plot in the lecture slides is produced by specifying particular bins for the histograms.
logx <- log(x) xbreaks <- seq(from = min(logx), to = max(logx), len = 25) ybreaks <- seq(from = min(y), to = max(y), len = 25) scatter_hist(x, y, log = "x", pch = 16, xlab ="Total Population (1000s)", ylab = ystring, xbreaks = xbreaks, ybreaks = ybreaks)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.