knitr::opts_chunk$set(echo = TRUE, warning=FALSE)
suppressWarnings(suppressMessages(suppressPackageStartupMessages(library(ggplot2))))

Introduction

Box plots are useful for visualizing data

Example 1 - A single vector of data

Introduction

This example is adapted from data supplied by https://www.statcan.gc.ca.

Analysis

Input the data

The masses (to the nearest tenth of a kilogram) of 30 students were measured and recorded as follows:

df <- data.frame(mass = c(59.2, 61.5, 62.3, 61.4, 60.9, 59.8,
                          60.5, 59.0, 61.1, 60.7, 61.6, 56.3,
                          61.9, 65.7, 60.4, 58.9, 59.0, 61.2,
                          62.1, 61.4, 58.4, 60.8, 60.2, 62.7,
                          60.0, 59.3, 61.9, 61.7, 58.4, 62.2))

Make the boxplot

Note: We only have one list of data, so when we use ggplot2 we set the x axis to "".

library(ggplot2)


p <- ggplot(df) + aes(x="", y=mass) +
     geom_boxplot() +
     ggtitle("Body mass of 30 students") +
     labs(x="", y="Body mass [kg]") +
     theme(plot.title = element_text(size = 14, family = "Tahoma", face = "bold", hjust = 0.5),
           panel.border = element_rect(colour = "black", fill=NA, size=.5),
           text = element_text(size = 12, family = "Tahoma"),
           axis.title = element_text(face="bold"),
           axis.text.x=element_text(size = 11))

print(p)

Summary

Note the outlier just under 66 kg. Also note that the whiskers have different lengths.

Example 2 - A more complicated example with multiple factors.

This is adapted from the ggplot2 examples here,

Choose a data set

We will use the standard mpg data set that comes with R.

summary(mpg)

Make the plot

plt <-  ggplot(mpg, aes(class, hwy)) +
    geom_boxplot() +
    ggtitle("Highway mpg of cars") +
    labs(x="car class", y="Highway mpg") +
    theme(plot.title = element_text(size = 14, family = "Tahoma", face = "bold", hjust = 0.5), # center title
          panel.border = element_rect(colour = "black", fill=NA, size=.5),
          text = element_text(size = 12, family = "Tahoma"),
          axis.title = element_text(face="bold"),
           axis.text.x=element_text(size = 11))
print(plt)

now let's superimpose the data

We will extend the plot. Note the use of the jitter argument to spread out the data along the x-axis.

plt <-  plt +
        geom_boxplot() +
        geom_jitter(width = 0.2)
print(plt)


jrminter/statshelpR documentation built on May 2, 2020, 12:08 a.m.