library(learnr) library(tidyverse) library(fivethirtyeight) library(tutorialExtras) library(gradethis) library(tutorial.helpers) library(ggcheck) gradethis_setup() knitr::opts_chunk$set(echo = FALSE) options( tutorial.exercise.timelimit = 60 #tutorial.storage = "local" ) dem_score <- read_csv("data/dem_score.csv") drinks_smaller <- drinks %>% filter(country %in% c("USA", "China", "Italy", "Saudi Arabia")) %>% select(-total_litres_of_pure_alcohol) %>% rename(beer = beer_servings, spirit = spirit_servings, wine = wine_servings) drinks_smaller_tidy <- drinks_smaller |> pivot_longer(-country, names_to = "type", values_to = "servings") guat_dem <- dem_score %>% filter(country == "Guatemala") guat_dem_tidy <- guat_dem %>% pivot_longer( cols = -country, names_to = "year", values_to = "democracy_score" ) |> mutate(year = as.numeric(year))
grade_server("grade")
question_text("Name:", answer_fn(function(value){ if(length(value) >= 1 ) { return(mark_as(TRUE)) } return(mark_as(FALSE) ) }), correct = "submitted", allow_retry = FALSE )
Complete this tutorial while reading Chapter 4 of the textbook.
You can check your current grade and the number of attempts you are on in the "View grade" section. You can click this button as often and as many times as you would like as you progress through the tutorial. Before submitting, make sure your grade is as expected.
Up to this point, we’ve almost entirely used data stored inside of an R package. Say instead you have your own data saved on your computer or somewhere online.
Two common file types for data are .csv
and .xlsx
extensions.
To read in a .csv
file we need the readr
package.
Use the library()
function to load the readr package.
library(...)
library(readr)
grade_this_code()
The .csv
file dem_score.csv
is accessible on the web at "https://moderndive.com/data/dem_score.csv".
Type read_csv("https://moderndive.com/data/dem_score.csv")
to read in the file.
read_csv(...)
read_csv("https://moderndive.com/data/dem_score.csv")
grade_this_code()
In order to use this data frame later, we need to store it in our Environment.
Before reading in the file type dem_score <-
to name the dataset dem_score
.
... <- read_csv("https://moderndive.com/data/dem_score.csv")
dem_score <- read_csv("https://moderndive.com/data/dem_score.csv")
grade_this_code()
Do NOT store the object as "dem_score.csv". Choose a name that is informative, yet easy to reference. Often times this can be the name of the file (without the extension type).
There is no output because we stored the data as an object.
To get a better understanding of the data you could print the data by typing dem_score
or use the glimpse()
function.
The previous exercise showed how to read in a file from the web. However, most of the time you will be reading in data from you computer.
Your project is known as your current working directory. Let's say you saved dem_score.csv
in a data/
sub directory within your project shown in the image below.
| | | |
|--------------------------|:-----------------:|--------------------------|
| {width="200"} | ➔ |
{width="200"} |
Within read_csv()
we need to specify the file path of our data from our working directory. In this case "data/dem_score.csv"
.
Read in the file from the computer by typing read_csv("data/dem_score.csv")
.
Store the file as dem_score
using dem_score <-
before read_csv. Then on the next line print the data by typing dem_score
.
... <- read_csv(...) dem_score
dem_score <- read_csv("data/dem_score.csv") dem_score
grade_this_code()
Note that the read_csv()
function included in the readr
package is different than the read.csv()
function that comes installed with R by default.
read_csv()
is the preferred function and what that we will always use.
In order to use the ggplot2
and dplyr
packages for data visualization and data wrangling, your input data frames must be in “tidy” format. So all non-“tidy” data must be converted to “tidy” format first.
Consider the following data frames that were created from the drinks
data frame included in the fivethirtyeight
package.
drinks_smaller
drinks_smaller_tidy
question_wordbank("Match the name of the data frame with the format it is in.", choices = c("drinks_smaller", "drinks_smaller_tidy"), wordbank = c("wide format", "long format"), answer(c("wide format","long format"),correct=TRUE), allow_retry = TRUE)
A dataset is messy or tidy depending on how rows, columns and tables are matched up with observations, variables and types.
question_wordbank("Complete the following three statements that define tidy data.", choices = c("Each variable forms a _____.", "Each observation forms a _____.", "Each type of observational unit forms a _____."), wordbank = c("column", "row", "table"), answer(c("column", "row", "table"),correct=TRUE), allow_retry = TRUE)
Determine if the following tables are in tidy format.
question_wordbank("", choices = c(paste0(htmltools::img(src="images/Figure_04_1_tidy.png", height = 175, width = 350) ), paste0(htmltools::img(src="images/Figure_04_2_nontidy.png", height = 75, width = 350) ), paste0(htmltools::img(src="images/Figure_04_3_tidy.png", height = 75, width = 350) ) ), wordbank = c("Tidy", "Not tidy"), answer(c("Tidy", "Not tidy", "Tidy"), correct=TRUE), allow_retry = TRUE)
The below code filters the dem_score
data frame we imported in "Importing data" to only include the country Guatemala.
guat_dem <- dem_score %>% filter(country == "Guatemala") guat_dem
Let’s produce a time-series plot showing how the democracy scores have changed over the 40 years from 1952 to 1992 for Guatemala.
In order to do that we need to tidy the data so that we have a year
variable that we can put on the x
-axis.
Start with guat_dem
and pipe on pivot_longer(cols = -country)
guat_dem %>% pivot_longer(...)
guat_dem %>% pivot_longer( cols = -country )
grade_this_code()
This tidies all variables that are NOT country because we used the minus (-) symbol.
Notice the names of the new variables are name
and `value. Those are not very informative or accurate variable names.
Within pivot_longer()
we can specify the new variable names with the names_to
and values_to
arguments.
Copy the previous code and set the names to "year"
and the values to "democracy_score"
as follows: pivot_longer(cols = -country, names_to = "year", values_to = "democracy_score")
guat_dem %>% pivot_longer( cols = -country, names_to = ..., values_to = ... )
guat_dem %>% pivot_longer( cols = -country, names_to = "year", values_to = "democracy_score" )
grade_this_code()
In order to reference or use this tidied data we need to store it. Copy the previous code and assign this new data to the name guat_dem_tidy
.
Print the data after by typing guat_dem_tidy
on the next line.
... <- guat_dem %>% pivot_longer( cols = -country, names_to = "year", values_to = "democracy_score" ) guat_dem_tidy
guat_dem_tidy <- guat_dem %>% pivot_longer( cols = -country, names_to = "year", values_to = "democracy_score" ) guat_dem_tidy
grade_this_code()
Notice year
is of type chr
(character).
If we were to plot democracy_score
by year
we would need year
to be a numeric variable. To change the variable type we can use the mutate()
function (Note: there is also a more advanced way to do this directly within pivot_longer
).
Start with guat_dem_tidy
and pipe on mutate(year = as.numeric(year))
guat_dem_tidy |> mutate(...)
guat_dem_tidy %>% mutate(year = as.numeric(year))
grade_this_code()
Within mutate
we overwrote the variable year
with a numeric version. In this case it is okay to overwrite the variable because we are not losing any information.
Copy the previous code and assign the new data frame to guat_dem_tidy
. In other words overwrite the data frame.
... <- guat_dem_tidy |> mutate(...)
guat_dem_tidy <- guat_dem_tidy %>% mutate(year = as.numeric(year))
grade_this_code()
Again it is okay to overwrite our data frame because we are not losing any of the original information.
Now we can create the plot to show how the democracy score of Guatemala changed from 1952 to 1992 using a geom_line()
.
Within ggplot()
set the first argument equal to guat_dem_tidy
and the second argument equal to aes(x = year, y = democracy_score)
. Add on the appropriate geom layer with the +
operator.
ggplot(..., aes(x = ..., y = ...)) + geom_...()
ggplot(guat_dem_tidy, aes(x = year, y = democracy_score)) + geom_line()
grade_this_code()
grade_button_ui(id = "grade")
Once you are finished:
grade_print_ui("grade")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.