library(learnr)
library(tidyverse)
library(nycflights13)
library(tutorialExtras)
library(gradethis)
library(tutorial.helpers)
library(ggcheck)

early_january_weather <- weather %>% 
  dplyr::filter(origin == "EWR" & month == 1 & day <= 15)

gradethis_setup()
knitr::opts_chunk$set(echo = FALSE)
options(
  tutorial.exercise.timelimit = 60
  #tutorial.storage = "local"
  ) 
grade_server("grade")

question_text("Name:",
              answer_fn(function(value){
                              if(length(value) >= 1 ) {
                                return(mark_as(TRUE))
                                }
                              return(mark_as(FALSE) )
                              }),
              correct = "submitted",
              allow_retry = FALSE )

Instructions

Complete this tutorial while reading Sections 2.4 - 2.6 of the textbook. Each question allows 3 'free' attempts. After the third attempt a 10% deduction occurs per attempt.

You can check your current grade and the number of attempts you are on in the "View grade" section. You can click this button as often and as many times as you would like as you progress through the tutorial. Before submitting, make sure your grade is as expected.

Goals

5NG#2: Linegraphs

Linegraphs show the relationship between two numerical variables when the variable on the x-axis, also called the explanatory variable, is of a sequential nature; in other words there is an inherent ordering to the variable.

Examples of sequential variables include hours, days, weeks, years, etc

Exercise 1

At the top of any document or script the first thing we need to do is load the needed packages.

Load the ggplot2 package followed by the nycflights13 package using the library() function.


library(...)
library(...)
library(ggplot2)
library(nycflights13)
grade_this_code()

The ggplot2 package is used for making data visualizations and the nycflights13 package contains the weather dataset that we will use in this next example.

Exercise 2

Let’s get a sense for the weather data frame. Run glimpse(weather) in the code chunk.


glimpse(...)
glimpse(weather)
grade_this_code()

We observe that the data frame has 15 variables and 26,115 observations. There is a variable called temp of hourly temperature recordings in Fahrenheit at weather stations near all three airports in New York City.

Exercise 3

For simplicity, we are going to work with a subset of this dataframe called early_january_weather which only includes hourly temperatures at the Newark airport for the first 15 days in January.

Click "Submit Answer" to create this new dataset.

early_january_weather <- weather %>% 
  filter(origin == "EWR" & month == 1 & day <= 15)
early_january_weather <- weather %>% 
  filter(origin == "EWR" & month == 1 & day <= 15)
early_january_weather <- weather %>% 
  filter(origin == "EWR" & month == 1 & day <= 15)
grade_this_code()

We will learn more about data wrangling in Chapter 3. The filter() function subsets the data frame to only contain rows that meet the criteria specified. In this case the origin is "EWR" and the month is January and the day is less than or equal to 15.

Exercise 4

Just like with a scatterplot, we will initialize the plot with the ggplot() function and set the first argument to be data = early_january_weather.


ggplot(data = ...)
ggplot(data = early_january_weather)
grade_this_code()

You should see a blank, grey square. R has set up the area in which it can place a plot.

Exercise 5

Copy the previous code and specify the aesthetic mapping.

Specifically:


ggplot(data = early_january_weather, 
       mapping = aes(x = ..., y = ...))
ggplot(data = early_january_weather, 
       mapping = aes(x = time_hour, y = temp))
grade_this_code()

Exercise 6

Copy the previous code and use the + operator to add geom_line().


ggplot(data = early_january_weather, 
       mapping = aes(x = time_hour, y = temp)) + 
  geom_...
ggplot(data = early_january_weather, 
       mapping = aes(x = time_hour, y = temp)) +
  geom_line()
grade_this_code()

Here, a linegraph is the most appropriate graphic because we are observing the relationship between two numeric variables AND time_hour has an inherent ordering, like some notion of time.

5NG#3: Histograms

If we are interested in the distribution of a single numeric variable one graph we can use is a histogram.

A distribution helps us understand:

Exercise 1

We will start by exploring the distribution of temperature in the weather dataset.

Initialize the plot with the ggplot() function and set the first argument to be data = weather.


ggplot(data = ...)
ggplot(data = weather)
grade_this_code()

You should see a blank, grey square. R has set up the area in which it can place a plot.

Exercise 2

There will now be only one variable mapped in aes(): the single numerical variable temp. The y-aesthetic of a histogram gets computed for you automatically.

Within ggplot() set mapping = aes(x = temp)


ggplot(data = weather, 
       mapping = aes(x = ...))
ggplot(data = weather, 
       mapping = aes(x = temp))
grade_this_code()

Exercise 3

Copy the previous code and use the + operator to add geom_histogram().


ggplot(data = weather, 
       mapping = aes(x = temp)) + 
  geom_...
ggplot(data = weather, 
       mapping = aes(x = temp)) +
  geom_histogram()
grade_this_code()

A histogram divides the x-axis into equally spaced bins.

The y-axis then tells us how many observations fall into the bin.

question_numeric("What is the default number of bins R uses to build a histogram?",
           answer(30, correct = TRUE),
           allow_retry = TRUE)

Exercise 4

In the previous plot, it is difficult to see where one bin stops and another bin starts. Copy the previous code and add white borders around the bins by adding a color = "white" argument to geom_histogram().


ggplot(data = weather, 
       mapping = aes(x = temp)) + 
  geom_histogram(...)
ggplot(data = weather, 
       mapping = aes(x = temp)) +
  geom_histogram(color = "white")
grade_this_code()

We can now better associate ranges of temperatures to each of the bins.

Exercise 5

Copy the previous code and add fill = "steelblue" after the color argument within geom_histogram().


ggplot(data = weather, 
       mapping = aes(x = temp)) + 
  geom_histogram(color = "white", fill = ...)
ggplot(data = weather, 
       mapping = aes(x = temp)) +
  geom_histogram(color = "white", fill = "steelblue")
grade_this_code()

There are 657 possible choices of color names that you could use.

In the Console of RStudio, type colors() and click enter.

question("What is the name of the last color from the output?",
           type = "learnr_text",
           answer_fn(function(value){
            if (str_remove_all(tolower(value), " ") %in%
                c("yellowgreen", "'yellowgreen'",
                  '"yellowgreen"')) {
                 return(mark_as(TRUE))}
                 return(mark_as(FALSE) )
                              }),
         allow_retry = TRUE
)

Exercise 6

We saw previously that the default number of bins is 30. You will almost always need to test out a variety of different bin values or widths to find a value that is appropriate.

Copy the previous code and adjust the number of bins by setting the bins argument to 40.


ggplot(data = weather, 
       mapping = aes(x = temp)) + 
  geom_histogram(color = "white", fill = "steelblue",
                 bins = ...)
ggplot(data = weather, 
       mapping = aes(x = temp)) +
  geom_histogram(color = "white", fill = "steelblue",
                 bins = 40)
grade_this_code()

Exercise 7

Alternatively, instead of specifying the number of bins we can specify the width of each bin.

Copy the previous code remove bins = 40. This time add binwidth = 10.


ggplot(data = weather, 
       mapping = aes(x = temp)) + 
  geom_histogram(color = "white", fill = "steelblue",
                 binwidth = ...)
ggplot(data = weather, 
       mapping = aes(x = temp)) +
  geom_histogram(color = "white", fill = "steelblue",
                 binwidth = 10)
grade_this_code()

On the x-axis we can see that each bin now has a width of 10. For example, we can see one bin goes from 25 to 35 with nearly 3,000 observations in it.

It's important to note that you NEVER set BOTH bin and binwidth. You choose one or the other.

When choosing an appropriate bin size, it is important to balance the detail and readability of the histogram. Too few of bins hides the patterns and shape of the data. While too many bins adds noise and can make it difficult to interpret.

You will likely test out many different bins before picking the most appropriate one.

Faceting

Faceting is used when we’d like to split a particular visualization of variables by another variable. This will create multiple copies of the same type of plot with matching x and y axes, but whose content will differ.

Exercise 1

The code for a histogram of temperature with a binwidth of 5 has been started for you.

Add on the layer facet_wrap(~ month) with the + operator.

ggplot(data = weather, mapping = aes(x = temp)) +
  geom_histogram(binwidth = 5, color = "white")
ggplot(data = weather, mapping = aes(x = temp)) +
  geom_histogram(binwidth = 5, color = "white") +
  ...
ggplot(data = weather, mapping = aes(x = temp)) +
  geom_histogram(binwidth = 5, color = "white") +
  facet_wrap(~month)
grade_this_code()

The histogram is now split by the 12 possible months in a given year.

Exercise 2

We can also specify the number of rows and columns in the grid by using the nrow and ncol arguments inside of facet_wrap().

Copy the previous code and add the nrow = 4 argument to facet_wrap(~ month).


ggplot(data = weather, mapping = aes(x = temp)) +
  geom_histogram(binwidth = 5, color = "white") +
  ...
ggplot(data = weather, mapping = aes(x = temp)) +
  geom_histogram(binwidth = 5, color = "white") +
  facet_wrap(~month, nrow = 4)
grade_this_code()

View grade

grade_button_ui(id = "grade")

Submit

Once you are finished:

grade_print_ui("grade")


NUstat/ISDStutorials documentation built on April 17, 2025, 6:15 p.m.