library(learnr)
library(tidyverse)
library(nycflights13)
library(tutorialExtras)
library(gradethis)
library(tutorial.helpers)
library(ggcheck)

gradethis_setup()
knitr::opts_chunk$set(echo = FALSE)
options(
  tutorial.exercise.timelimit = 60
  #tutorial.storage = "local"
  ) 

fruits <- tibble(
  fruit = c("apple", "apple", "orange", "apple", "orange")
  )

fruits_counted <- tibble(
  fruit = c("apple", "orange"),
  number = c(3, 2)
  )
grade_server("grade")

question_text("Name:",
              answer_fn(function(value){
                              if(length(value) >= 1 ) {
                                return(mark_as(TRUE))
                                }
                              return(mark_as(FALSE) )
                              }),
              correct = "submitted",
              allow_retry = FALSE )

Instructions

Complete this tutorial while reading Sections 2.7 - 2.9 of the textbook. Each question allows 3 'free' attempts. After the third attempt a 10% deduction occurs per attempt.

You can check your current grade and the number of attempts you are on in the "View grade" section. You can click this button as often and as many times as you would like as you progress through the tutorial. Before submitting, make sure your grade is as expected.

Goals

5NG#4: Boxplots

Similar to a histogram, a boxplot shows the distribution of a single numeric variable.

To compare distributions of a numerical variable split by another variable, another graphic besides a faceted histogram to achieve this is a side-by-side boxplot.

Exercise 1

A boxplot is constructed from the information provided in the five-number summary of a numerical variable.

question("Which of the following summary statistics are included in the five-number summary and are used to construct a boxplot when there are no “outliers” in the data?",
           answer("minimum", correct = TRUE),
           answer("maximum", correct = TRUE),
           answer("mode"),
           answer("first quantile (Q1, 25th percentile)", correct = TRUE),
           answer("standard deviation"),
           answer("third quantile (Q3, 75th percentile)", correct = TRUE),
           answer("median", correct = TRUE),
           answer("mean"),
           allow_retry = TRUE,
           random_answer_order = TRUE)

Exercise 2

question_wordbank("Drag and drop the features of a boxplot with the information they display about the data.",
        choices = c("lines extending from the box to points less than the 25th percentile or greater than the 75th percentile",
                 "interquartile range (i.e. a measure of the spread of the data)",
                 "outliers",
                 "1st quartile, median, 3rd quartile (i.e. the middle 50% of the data)"),
        wordbank = c("whiskers", "length", "dots", "box"),
        answer(c("whiskers", "length", "dots", "box"), 
        correct = TRUE), 
        allow_retry = TRUE )

Exercise 3

Let’s create a side-by-side boxplot of hourly temperatures split by the 12 months as we did in the past tutorial with the faceted histograms.

Within ggplot() set the data = weather. Set the second argument to mapping = aes() and within aes() define:


ggplot(data = weather, 
       mapping = aes(x = ..., y = ...))
ggplot(data = weather, 
       mapping = aes(x = month, y = temp))
grade_this_code()

Exercise 4

Copy the previous code and use the + operator to add geom_boxplot().


ggplot(data = weather, 
       mapping = aes(x = month, y = temp)) +
  geom_...
ggplot(data = weather, 
       mapping = aes(x = month, y = temp)) +
  geom_boxplot()
grade_this_code()

Oh no, this plot does not provide information about temperature separated by month! The warning messages clue us in as to why.

The first warning message is telling us that we have a “continuous”, or numerical variable, on the x-position aesthetic. Side-by-side boxplots require one categorical variable and one numeric variable.

Exercise 5

Copy the previous code and convert the numerical variable month into a categorical variable by using the factor() function


ggplot(data = weather, 
       mapping = aes(x = ...(month), y = temp)) +
  geom_boxplot()
ggplot(data = weather, 
       mapping = aes(x = factor(month), y = temp)) +
  geom_boxplot()
grade_this_code()

5NG#5: Barplots

Another common task is visualize the distribution of a categorical variable. This is a simpler task, as we are simply counting different categories, also known as levels, of a categorical variable.

Exercise 1

Below is the code we used to manually create two data frames, fruit and fruit_counted, representing a collection of fruit: 3 apples and 2 oranges.

fruits <- tibble(
  fruit = c("apple", "apple", "orange", "apple", "orange")
  )

fruits_counted <- tibble(
  fruit = c("apple", "orange"),
  number = c(3, 2)
  )

Run fruits in the code chunk to print the data frame.


...
fruits
grade_this_code()

Notice that fruits just lists the fruit individually.

Exercise 2

Now, run fruits_counted in the code chunk to print the data frame.


...
fruits_counted
grade_this_code()

fruits_counted has a variable number which represents pre-counted values of each fruit.

Exercise 3

Let’s first generate a barplot using the fruits data frame where all 5 fruits are listed individually in 5 rows.

Use the ggplot() function with data = fruits and mapping = aes(x = fruit).

Be careful the data frame is called fruits and the variable is called fruit.


ggplot(data = ..., mapping = aes(x = ...))
ggplot(data = fruits, mapping = aes(x = fruit))
grade_this_code()

Exercise 4

Add a geom_bar() layer.


ggplot(data = fruits, mapping = aes(x = fruit)) +
  geom_...
ggplot(data = fruits, mapping = aes(x = fruit)) +
  geom_bar()
grade_this_code()

Since the data was in list form (not pre-counted), there is no y-aesthetic needed.

Exercise 5

Copy the previous code and make the following modifications:


ggplot(data = fruits, mapping = aes(x = fruit)) +
  geom_...
ggplot(data = fruits, mapping = aes(x = fruit, y = count)) +
  geom_col()
grade_this_code()

Since this data frame is pre-counted we need to specify the counts of each fruit as the y aesthetic (whereas geom_bar() counts the list for us). Recall from Exercise 2 the name of the variable was number.

question_wordbank("Which geometric layer do you use with categorical data that is...",
  choices = c("NOT pre-counted", "pre-counted"),
  answer(c("geom_bar()", "geom_col()"), correct = TRUE),
           allow_retry = TRUE,
           random_answer_order = TRUE)

Exercise 6

Recall our flights dataset from the nycflights13 package. The package has already been pre-loaded for you and a glimpse() of the dataset is shown below.

glimpse(flights)

Using ggplot() set the data = flights and assign the x-axis aesthetic to be carrier. Then add the appropriate geom layer.


ggplot(data = ..., mapping = aes(x = ...)) +
  geom_...()
ggplot(data = flights, mapping = aes(x = carrier)) +
  geom_bar()
grade_this_code()

Observe that United Air Lines (UA) had the most flights depart New York City in 2013 and SkyWest Airlines Inc. (OO) had the least.

If you don’t know which airlines correspond to which carrier codes, then run View(airlines) to see a directory of airlines.

Exercise 7

Another use of barplots is to visualize the joint distribution of two categorical variables at the same time.

Let’s examine the joint distribution of outgoing domestic flights from NYC by carrier and origin, or in other words the number of flights for each carrier and origin combination.

Copy the previous code and map the additional variable origin by adding a fill = origin inside the aes() aesthetic mapping


ggplot(data = flights, mapping = aes(x = carrier, ...)) +
  geom_bar()
ggplot(data = flights, mapping = aes(x = carrier, fill = origin)) +
  geom_bar()
grade_this_code()

This is an example of a stacked barplot. While easy to make it is not always the most ideal.

Exercise 7

An alternative to stacked barplots are side-by-side barplots, also known as a dodged barplot.

Copy the previous code and add the argument position = "dodge" within geom_bar().


ggplot(data = flights, mapping = aes(x = carrier, fill = origin)) +
  geom_bar(position = ...)
ggplot(data = flights, mapping = aes(x = carrier, fill = origin)) +
  geom_bar(position = "dodge")
grade_this_code()

This shows the same information as a faceted barplot.

View grade

grade_button_ui(id = "grade")

Submit

Once you are finished:

grade_print_ui("grade")


NUstat/ISDStutorials documentation built on April 17, 2025, 6:15 p.m.