In NUstat/ISDStutorials: Tutorial Lessons for Introduction to Statistics and Data Science

library(learnr)
library(tidyverse)
library(nycflights13)
library(tutorialExtras)
library(gradethis)
library(tutorial.helpers)
library(ggcheck)

gradethis_setup()
knitr::opts_chunk$set(echo = FALSE)
options(
  tutorial.exercise.timelimit = 60
  #tutorial.storage = "local"
  ) 

fruits <- tibble(
  fruit = c("apple", "apple", "orange", "apple", "orange")
  )

fruits_counted <- tibble(
  fruit = c("apple", "orange"),
  number = c(3, 2)
  )

grade_server("grade")

question_text("Name:",
              answer_fn(function(value){
                              if(length(value) >= 1 ) {
                                return(mark_as(TRUE))
                                }
                              return(mark_as(FALSE) )
                              }),
              correct = "submitted",
              allow_retry = FALSE )

Instructions

Complete this tutorial while reading Sections 2.7 - 2.9 of the textbook. Each question allows 3 'free' attempts. After the third attempt a 10% deduction occurs per attempt.

You can check your current grade and the number of attempts you are on in the "View grade" section. You can click this button as often and as many times as you would like as you progress through the tutorial. Before submitting, make sure your grade is as expected.

Goals

Learn to plot and interpret boxplots.
Learn to plot a single categorical variable.
Understand the different ways to visualize two categorical variables in a barplot.

5NG#4: Boxplots

Similar to a histogram, a boxplot shows the distribution of a single numeric variable.

To compare distributions of a numerical variable split by another variable, another graphic besides a faceted histogram to achieve this is a side-by-side boxplot.

Exercise 1

A boxplot is constructed from the information provided in the five-number summary of a numerical variable.

question("Which of the following summary statistics are included in the five-number summary and are used to construct a boxplot when there are no “outliers” in the data?",
           answer("minimum", correct = TRUE),
           answer("maximum", correct = TRUE),
           answer("mode"),
           answer("first quantile (Q1, 25th percentile)", correct = TRUE),
           answer("standard deviation"),
           answer("third quantile (Q3, 75th percentile)", correct = TRUE),
           answer("median", correct = TRUE),
           answer("mean"),
           allow_retry = TRUE,
           random_answer_order = TRUE)

Exercise 2

question_wordbank("Drag and drop the features of a boxplot with the information they display about the data.",
        choices = c("lines extending from the box to points less than the 25th percentile or greater than the 75th percentile",
                 "interquartile range (i.e. a measure of the spread of the data)",
                 "outliers",
                 "1st quartile, median, 3rd quartile (i.e. the middle 50% of the data)"),
        wordbank = c("whiskers", "length", "dots", "box"),
        answer(c("whiskers", "length", "dots", "box"), 
        correct = TRUE), 
        allow_retry = TRUE )

Exercise 3

Let’s create a side-by-side boxplot of hourly temperatures split by the 12 months as we did in the past tutorial with the faceted histograms.

Within ggplot() set the data = weather. Set the second argument to mapping = aes() and within aes() define:

x to be the variable month
y to be the numeric variable temp

ggplot(data = weather, 
       mapping = aes(x = ..., y = ...))

ggplot(data = weather, 
       mapping = aes(x = month, y = temp))

grade_this_code()

Exercise 4

Copy the previous code and use the + operator to add geom_boxplot().

ggplot(data = weather, 
       mapping = aes(x = month, y = temp)) +
  geom_...

ggplot(data = weather, 
       mapping = aes(x = month, y = temp)) +
  geom_boxplot()

grade_this_code()

Oh no, this plot does not provide information about temperature separated by month! The warning messages clue us in as to why.

The first warning message is telling us that we have a “continuous”, or numerical variable, on the x-position aesthetic. Side-by-side boxplots require one categorical variable and one numeric variable.

Exercise 5

Copy the previous code and convert the numerical variable month into a categorical variable by using the factor() function

ggplot(data = weather, 
       mapping = aes(x = ...(month), y = temp)) +
  geom_boxplot()

ggplot(data = weather, 
       mapping = aes(x = factor(month), y = temp)) +
  geom_boxplot()

grade_this_code()

5NG#5: Barplots

Another common task is visualize the distribution of a categorical variable. This is a simpler task, as we are simply counting different categories, also known as levels, of a categorical variable.

Exercise 1

Below is the code we used to manually create two data frames, fruit and fruit_counted, representing a collection of fruit: 3 apples and 2 oranges.

fruits <- tibble(
  fruit = c("apple", "apple", "orange", "apple", "orange")
  )

fruits_counted <- tibble(
  fruit = c("apple", "orange"),
  number = c(3, 2)
  )

Run fruits in the code chunk to print the data frame.

...

fruits

grade_this_code()

Notice that fruits just lists the fruit individually.

Exercise 2

Now, run fruits_counted in the code chunk to print the data frame.

...

fruits_counted

grade_this_code()

fruits_counted has a variable number which represents pre-counted values of each fruit.

Exercise 3

Let’s first generate a barplot using the fruits data frame where all 5 fruits are listed individually in 5 rows.

Use the ggplot() function with data = fruits and mapping = aes(x = fruit).

Be careful the data frame is called fruits and the variable is called fruit.

ggplot(data = ..., mapping = aes(x = ...))

ggplot(data = fruits, mapping = aes(x = fruit))

grade_this_code()

Exercise 4

Add a geom_bar() layer.

ggplot(data = fruits, mapping = aes(x = fruit)) +
  geom_...

ggplot(data = fruits, mapping = aes(x = fruit)) +
  geom_bar()

grade_this_code()

Since the data was in list form (not pre-counted), there is no y-aesthetic needed.

Exercise 5

Copy the previous code and make the following modifications:

set data = fruits_counted
add in y = number after the x aesthetic
replace geom_bar() with geom_col()

ggplot(data = fruits, mapping = aes(x = fruit)) +
  geom_...

ggplot(data = fruits, mapping = aes(x = fruit, y = count)) +
  geom_col()

grade_this_code()

Since this data frame is pre-counted we need to specify the counts of each fruit as the y aesthetic (whereas geom_bar() counts the list for us). Recall from Exercise 2 the name of the variable was number.

question_wordbank("Which geometric layer do you use with categorical data that is...",
  choices = c("NOT pre-counted", "pre-counted"),
  answer(c("geom_bar()", "geom_col()"), correct = TRUE),
           allow_retry = TRUE,
           random_answer_order = TRUE)

Exercise 6

Recall our flights dataset from the nycflights13 package. The package has already been pre-loaded for you and a glimpse() of the dataset is shown below.

glimpse(flights)

Using ggplot() set the data = flights and assign the x-axis aesthetic to be carrier. Then add the appropriate geom layer.

ggplot(data = ..., mapping = aes(x = ...)) +
  geom_...()

ggplot(data = flights, mapping = aes(x = carrier)) +
  geom_bar()

grade_this_code()

Observe that United Air Lines (UA) had the most flights depart New York City in 2013 and SkyWest Airlines Inc. (OO) had the least.

If you don’t know which airlines correspond to which carrier codes, then run View(airlines) to see a directory of airlines.

Exercise 7

Another use of barplots is to visualize the joint distribution of two categorical variables at the same time.

Let’s examine the joint distribution of outgoing domestic flights from NYC by carrier and origin, or in other words the number of flights for each carrier and origin combination.

Copy the previous code and map the additional variable origin by adding a fill = origin inside the aes() aesthetic mapping

ggplot(data = flights, mapping = aes(x = carrier, ...)) +
  geom_bar()

ggplot(data = flights, mapping = aes(x = carrier, fill = origin)) +
  geom_bar()

grade_this_code()

This is an example of a stacked barplot. While easy to make it is not always the most ideal.

Exercise 7

An alternative to stacked barplots are side-by-side barplots, also known as a dodged barplot.

Copy the previous code and add the argument position = "dodge" within geom_bar().

ggplot(data = flights, mapping = aes(x = carrier, fill = origin)) +
  geom_bar(position = ...)

ggplot(data = flights, mapping = aes(x = carrier, fill = origin)) +
  geom_bar(position = "dodge")

grade_this_code()

This shows the same information as a faceted barplot.

View grade

grade_button_ui(id = "grade")

Submit

Once you are finished:

Click the 'Download Grade' button below. This will download an html document of your grade summary.
Make sure your grade is correct and as expected!
Submit the downloaded html to Canvas.

grade_print_ui("grade")

NUstat/ISDStutorials documentation built on April 17, 2025, 6:15 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

NUstat/ISDStutorials
Tutorial Lessons for Introduction to Statistics and Data Science

In NUstat/ISDStutorials: Tutorial Lessons for Introduction to Statistics and Data Science

Instructions

Goals

5NG#4: Boxplots

Exercise 1

Exercise 2

Exercise 3

Exercise 4

Exercise 5

5NG#5: Barplots

Exercise 1

Exercise 2

Exercise 3

Exercise 4

Exercise 5

Exercise 6

Exercise 7

Exercise 7

View grade

Submit

R Package Documentation

Browse R Packages

We want your feedback!

NUstat/ISDStutorials Tutorial Lessons for Introduction to Statistics and Data Science

In NUstat/ISDStutorials: Tutorial Lessons for Introduction to Statistics and Data Science

Instructions

Goals

5NG#4: Boxplots

Exercise 1

Exercise 2

Exercise 3

Exercise 4

Exercise 5

5NG#5: Barplots

Exercise 1

Exercise 2

Exercise 3

Exercise 4

Exercise 5

Exercise 6

Exercise 7

Exercise 7

View grade

Submit

R Package Documentation

Browse R Packages

We want your feedback!

NUstat/ISDStutorials
Tutorial Lessons for Introduction to Statistics and Data Science