In NUstat/ISDStutorials: Tutorial Lessons for Introduction to Statistics and Data Science

library(learnr)
library(tidyverse)
library(fivethirtyeight)
library(tutorialExtras)
library(gradethis)
library(tutorial.helpers)
library(ggcheck)

gradethis_setup()
knitr::opts_chunk$set(echo = FALSE)
options(
  tutorial.exercise.timelimit = 60
  #tutorial.storage = "local"
  ) 

dem_score <- read_csv("data/dem_score.csv")

drinks_smaller <- drinks %>% 
  filter(country %in% c("USA", "China", "Italy", "Saudi Arabia")) %>% 
  select(-total_litres_of_pure_alcohol) %>% 
  rename(beer = beer_servings, spirit = spirit_servings, wine = wine_servings)

drinks_smaller_tidy <- drinks_smaller |> 
  pivot_longer(-country,
               names_to = "type",
               values_to = "servings")

guat_dem <- dem_score %>% 
  filter(country == "Guatemala")

guat_dem_tidy <- guat_dem %>% 
  pivot_longer(
    cols = -country, 
    names_to = "year", 
    values_to = "democracy_score"
  ) |> 
  mutate(year = as.numeric(year))

grade_server("grade")

question_text("Name:",
              answer_fn(function(value){
                              if(length(value) >= 1 ) {
                                return(mark_as(TRUE))
                                }
                              return(mark_as(FALSE) )
                              }),
              correct = "submitted",
              allow_retry = FALSE )

Instructions

Complete this tutorial while reading Chapter 4 of the textbook.

You can check your current grade and the number of attempts you are on in the "View grade" section. You can click this button as often and as many times as you would like as you progress through the tutorial. Before submitting, make sure your grade is as expected.

Goals

Understand what "tidy" data looks like.
Learn how to format data into "tidy" data.
Understand the main purposes of different packages.

Importing data

Up to this point, we’ve almost entirely used data stored inside of an R package. Say instead you have your own data saved on your computer or somewhere online.

Two common file types for data are .csv and .xlsx extensions.

Exercise 1

To read in a .csv file we need the readr package.

Use the library() function to load the readr package.

library(...)

library(readr)

grade_this_code()

Exercise 2

The .csv file dem_score.csv is accessible on the web at "https://moderndive.com/data/dem_score.csv".

Type read_csv("https://moderndive.com/data/dem_score.csv") to read in the file.

read_csv(...)

read_csv("https://moderndive.com/data/dem_score.csv")

grade_this_code()

In order to use this data frame later, we need to store it in our Environment.

Exercise 3

Before reading in the file type dem_score <- to name the dataset dem_score.

... <- read_csv("https://moderndive.com/data/dem_score.csv")

dem_score <- read_csv("https://moderndive.com/data/dem_score.csv")

grade_this_code()

Do NOT store the object as "dem_score.csv". Choose a name that is informative, yet easy to reference. Often times this can be the name of the file (without the extension type).

There is no output because we stored the data as an object.

To get a better understanding of the data you could print the data by typing dem_score or use the glimpse() function.

Exercise 4

The previous exercise showed how to read in a file from the web. However, most of the time you will be reading in data from you computer.

Your project is known as your current working directory. Let's say you saved dem_score.csv in a data/ sub directory within your project shown in the image below.

| | | | |--------------------------|:-----------------:|--------------------------| | {width="200"} | ➔ | {width="200"} |

Within read_csv() we need to specify the file path of our data from our working directory. In this case "data/dem_score.csv".

Read in the file from the computer by typing read_csv("data/dem_score.csv").

Store the file as dem_score using dem_score <- before read_csv. Then on the next line print the data by typing dem_score.

... <- read_csv(...)
dem_score

dem_score <- read_csv("data/dem_score.csv")
dem_score

grade_this_code()

Note that the read_csv() function included in the readr package is different than the read.csv() function that comes installed with R by default.

read_csv() is the preferred function and what that we will always use.

Tidy data

In order to use the ggplot2 and dplyr packages for data visualization and data wrangling, your input data frames must be in “tidy” format. So all non-“tidy” data must be converted to “tidy” format first.

Exercise 1

Consider the following data frames that were created from the drinks data frame included in the fivethirtyeight package.

drinks_smaller

drinks_smaller_tidy

question_wordbank("Match the name of the data frame with the format it is in.",
           choices = c("drinks_smaller", "drinks_smaller_tidy"),
           wordbank = c("wide format", "long format"),
           answer(c("wide format","long format"),correct=TRUE),
           allow_retry = TRUE)

Exercise 2

A dataset is messy or tidy depending on how rows, columns and tables are matched up with observations, variables and types.

question_wordbank("Complete the following three statements that define tidy data.",
           choices = c("Each variable forms a _____.",
                       "Each observation forms a _____.",
                       "Each type of observational unit forms a _____."),
           wordbank = c("column", "row", "table"),
           answer(c("column", "row", "table"),correct=TRUE),
           allow_retry = TRUE)

Exercise 3

Determine if the following tables are in tidy format.

  question_wordbank("",
           choices = c(paste0(htmltools::img(src="images/Figure_04_1_tidy.png", height = 175, width = 350) ), paste0(htmltools::img(src="images/Figure_04_2_nontidy.png", height = 75, width = 350) ), paste0(htmltools::img(src="images/Figure_04_3_tidy.png", height = 75, width = 350) ) ),
           wordbank = c("Tidy", "Not tidy"),
           answer(c("Tidy", "Not tidy", "Tidy"), correct=TRUE),
           allow_retry = TRUE)

Case study: Democracy in Guatemala

The below code filters the dem_score data frame we imported in "Importing data" to only include the country Guatemala.

guat_dem <- dem_score %>% 
  filter(country == "Guatemala")

guat_dem

Exercise 1

Let’s produce a time-series plot showing how the democracy scores have changed over the 40 years from 1952 to 1992 for Guatemala.

In order to do that we need to tidy the data so that we have a year variable that we can put on the x-axis.

Start with guat_dem and pipe on pivot_longer(cols = -country)

guat_dem %>%
  pivot_longer(...)

guat_dem %>% 
  pivot_longer(
    cols = -country
  )

grade_this_code()

This tidies all variables that are NOT country because we used the minus (-) symbol.

Notice the names of the new variables are name and `value. Those are not very informative or accurate variable names.

Exercise 2

Within pivot_longer() we can specify the new variable names with the names_to and values_to arguments.

Copy the previous code and set the names to "year" and the values to "democracy_score" as follows: pivot_longer(cols = -country, names_to = "year", values_to = "democracy_score")

guat_dem %>%
  pivot_longer(
    cols = -country, 
    names_to = ..., 
    values_to = ...
    )

guat_dem %>% 
  pivot_longer(
    cols = -country, 
    names_to = "year", 
    values_to = "democracy_score"
  )

grade_this_code()

Exercise 3

In order to reference or use this tidied data we need to store it. Copy the previous code and assign this new data to the name guat_dem_tidy.

Print the data after by typing guat_dem_tidy on the next line.

... <- guat_dem %>%
  pivot_longer(
    cols = -country, 
    names_to = "year", 
    values_to = "democracy_score"
    )
guat_dem_tidy

guat_dem_tidy <- guat_dem %>% 
  pivot_longer(
    cols = -country, 
    names_to = "year", 
    values_to = "democracy_score"
  ) 
guat_dem_tidy

grade_this_code()

Notice year is of type chr (character).

Exercise 4

If we were to plot democracy_score by year we would need year to be a numeric variable. To change the variable type we can use the mutate() function (Note: there is also a more advanced way to do this directly within pivot_longer).

Start with guat_dem_tidy and pipe on mutate(year = as.numeric(year))

guat_dem_tidy |> 
  mutate(...)

guat_dem_tidy %>% 
  mutate(year = as.numeric(year))

grade_this_code()

Within mutate we overwrote the variable year with a numeric version. In this case it is okay to overwrite the variable because we are not losing any information.

Exercise 5

Copy the previous code and assign the new data frame to guat_dem_tidy. In other words overwrite the data frame.

... <- guat_dem_tidy |> 
  mutate(...)

guat_dem_tidy <- guat_dem_tidy %>% 
  mutate(year = as.numeric(year))

grade_this_code()

Again it is okay to overwrite our data frame because we are not losing any of the original information.

Exercise 6

Now we can create the plot to show how the democracy score of Guatemala changed from 1952 to 1992 using a geom_line().

Within ggplot() set the first argument equal to guat_dem_tidy and the second argument equal to aes(x = year, y = democracy_score). Add on the appropriate geom layer with the + operator.

ggplot(..., aes(x = ..., y = ...)) +
  geom_...()

ggplot(guat_dem_tidy, aes(x = year, y = democracy_score)) +
  geom_line()

grade_this_code()

View grade

grade_button_ui(id = "grade")

Submit

Once you are finished:

Click the 'Download Grade' button below. This will download an html document of your grade summary.
Make sure your grade is correct and as expected!
Submit the downloaded html to Canvas.

grade_print_ui("grade")

NUstat/ISDStutorials documentation built on April 17, 2025, 6:15 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

NUstat/ISDStutorials
Tutorial Lessons for Introduction to Statistics and Data Science

In NUstat/ISDStutorials: Tutorial Lessons for Introduction to Statistics and Data Science

Instructions

Goals

Importing data

Exercise 1

Exercise 2

Exercise 3

Exercise 4

Tidy data

Exercise 1

Exercise 2

Exercise 3

Case study: Democracy in Guatemala

Exercise 1

Exercise 2

Exercise 3

Exercise 4

Exercise 5

Exercise 6

View grade

Submit

R Package Documentation

Browse R Packages

We want your feedback!

NUstat/ISDStutorials Tutorial Lessons for Introduction to Statistics and Data Science

In NUstat/ISDStutorials: Tutorial Lessons for Introduction to Statistics and Data Science

Instructions

Goals

Importing data

Exercise 1

Exercise 2

Exercise 3

Exercise 4

Tidy data

Exercise 1

Exercise 2

Exercise 3

Case study: Democracy in Guatemala

Exercise 1

Exercise 2

Exercise 3

Exercise 4

Exercise 5

Exercise 6

View grade

Submit

R Package Documentation

Browse R Packages

We want your feedback!

NUstat/ISDStutorials
Tutorial Lessons for Introduction to Statistics and Data Science