library(tidyverse)
library(PPBDS.data)
library(learnr)
library(shiny)
library(gapminder)
library(skimr)
knitr::opts_chunk$set(echo = FALSE, message = FALSE)
options(tutorial.exercise.timelimit = 60, tutorial.storage="local")

Confirm Correct Package Version

Confirm that you have the correct version of PPBDS.data installed by pressing "Run Code."

packageVersion('PPBDS.data')

The answer should be ‘0.3.2.9006’, or a higher number. If it is not, you should upgrade your installation by issuing these commands:

remove.packages('PPBDS.data')  
library(remotes)  
remotes::install_github('davidkane9/PPBDS.data')  

Strictly speaking, it should not be necessary to remove a package. Just installing it again should overwrite the current version. But weird things sometimes happen, so removing first is the safest approach.

Name

``` {r name} question_text( "Student Name:", answer(NULL, correct = TRUE), allow_retry = TRUE, try_again_button = "Modify your answer", incorrect = "Ok" )

## Email
###

``` {r email, echo=FALSE}
question_text(
  "Email:",
  answer(NULL, correct = TRUE),
  incorrect = "Ok",
  try_again_button = "Modify your answer",
  allow_retry = TRUE
)

Introduction

Let's do a quick refresher about functions.

Exercise 1

quiz(
  question("Select all that apply: Which of the following operations can be achieved by running a function in R?",
           answer("Taking a log value", correct = TRUE),
           answer("Fitting a linear model", correct = TRUE),
           answer("Averaging two numbers", correct = TRUE),
           allow_retry = TRUE)
)

Exercise 2

quiz(
question("Which of the following is not a part of a function?",
         allow_retry = TRUE,
         random_answer_order = TRUE,
         answer("A name", correct = TRUE, message = "That's right, functions contain a code body, a set of formal arguments, and an environment, but not a name. They just happen to inherit a name from the object they are stored in."),
         answer("A code body", message = "Functions do require a code body! You can inspect this part of the function by using the helper function body()."),
         answer("A (potentially empty) set of formal arguments", message = "Functions do require formal arguments, even if they are empty! You can inspect this part of the function by using the helper function formals()."),
         answer("An environment", message = "Functions do require an environment, which is a self-contained space to look up the values of any objects within it. You can inspect this part of the function by using the helper function environment()."))  
)

Exercise 3

quiz(
question("Which of these will run the `Sys.time` function?",
         answer("Sys.time"),
         answer("Sys.time()", correct = TRUE, 
         message = "The open and closed parentheses tell R to execute the code in the code body of the function stored in the object names Sys.time."))
)

Lists and list-columns

Exercise 1

We use map_* functions to create list

quiz(
question("What does map_dbl() mean?",
           answer("The input to the map_dbl() function must be numeric"),
           answer("The output of the map_dbl() function will be numeric", 
                  correct = TRUE),
           allow_retry = TRUE)
)

Exercise 2

quiz(
question("How are map_* functions different than mutate()?",
           allow_retry = TRUE,
           answer("They can take list inputs and iterate over each element of a list", correct = TRUE),
           answer("They are the same, except you can specify what output data type you'd like to have"),
           answer("They can apply functions to their inputs, whereas mutate() cannot"),
           answer("They can apply custom and anonymous functions to their inputs, whereas mutate() can only apply built-in functions to their inputs"))  
)

Exercise 3

quiz(
  question("Which of the following is correct about map_* functions?",
            allow_retry = TRUE,
           answer("You do not need the parentheses when specifying a function for the map_* function to apply. Ex: map(data, mean)", correct = TRUE),
           answer("You do need the parentheses when specifying a function for the map_* function to apply. Ex: map(data, mean())"),
           answer("You cannot have a list with NAs, or else using the map_* function will return an error", message = "This is not necessarily true. Hint: What does the \"...\" mean in map_*? Refer to the next question!")),
  question("What does the \"...\" argument mean in a map_* function?",
           allow_retry = TRUE,
           answer("Nothing. R functions commonly have an auxiliary \"...\" argument"),
           answer("It means that the map_* function will take in its default arguments"),
           answer("It means that you can specify additional arguments to be passed into the given function", correct = TRUE))
)

Exercise 4

A simple tibble was created with one variable. Add to the pipe some code which creates a new list-column called col_2. Use mutate function and the argument str_split() to split the phrases into "Government", "Data Science", "Preceptor", and "Primer".

tibble(col_1 = c("Government and Data Science", "Preceptor and Primer"))
tibble(col_1 = c("Government and Data Science", "Preceptor and Primer")) %>%
  mutate(col_2 = str_split(..., " and "))

Exercise 5

Now append str() to the end of the pipe in order to read the tibble contents.

tibble(col_1 = c("Government and Data Science", "Preceptor and Primer")) %>%
  mutate(col_2 = str_split(col_1, " and "))
tibble(col_1 = c("Government and Data Science", "Preceptor and Primer")) %>%
  mutate(col_2 = str_split(..., " and ")) %>%
  str()

Creating columns using map_*

Exercise 1

Explore the ChickWeight dataset using skim().


skim(...)

Exercise 2

Great! Now create a list-column of weights using mutate()called weight_groups, grouping by Diet and Time.

ChickWeight %>%
  group_by(Diet, Time) %>%
  mutate(...)
ChickWeight %>%
  group_by(Diet, Time) %>%
  mutate(weight_groups = list(...))

Exercise 3

Use the map_dbl function and mutate() to add a new column, mean_weight, which is the mean weight per row. Recall that each row corresponds to a unique combination of Diet and Time. As always, you should start by copy-pasting from the last answer.


ChickWeight %>%
  group_by(Diet, Time) %>%
  mutate(weight_groups = list(weight)) %>%
  mutate(mean_weight = map_dbl(weight_groups, ...))

Exercise 4

Now let's use map_dbl() to round each mean_weight to the nearest integer. Name this new column rounded_mean.


ChickWeight %>%
  group_by(Diet, Time) %>%
  mutate(weight_groups = list(weight)) %>%
  mutate(mean_weight = map_dbl(weight_groups, mean)) %>%
  mutate(rounded_mean = map_dbl(mean_weight, ...))

Exercise 5

Now let's use the gapminder dataset. Create a list-column named gdpPercap_yearly that lists the gdpPercap per year grouping by continent.


gapminder %>%
  group_by(year, continent) %>%
  mutate(gdpPercap_yearly = list(...))

Exercise 6

Now, let's build upon the previous question's code. Instead of taking the gdpPercap, calculate the annual GDP of the country in a new variable called annual_gdp.

gapminder %>%
  group_by(year, continent) %>%
  mutate(annual_gdp = list(...))
gapminder %>%
  group_by(year, continent) %>%
  mutate(annual_gdp = list(gdpPercap * ...))

Exercise 7

What was the lowest GDP in each year x continent category? Create a new column named lowest of data type "double" that contains the lowest GDP per year/continent. Pay attention to the map_* function you use.

gapminder %>%
  group_by(year, continent) %>%
  mutate(annual_gdp = list(gdpPercap * pop)) %>%
  mutate(lowest = ...)
gapminder %>%
  group_by(year, continent) %>%
  mutate(annual_gdp = list(gdpPercap * pop)) %>%
  mutate(lowest = map_dbl(annual_gdp, ...))

Custom Functions

Exercise 1

Here is a custom function we will call foo:

foo <- function(){
  a <- 10
  a
}
body(foo)

What will be the final value of a if I run the following three lines of code in order?

a <- 1
foo()
a
quiz(
  question("",
         answer("1", correct = TRUE, message = "What happens in a function, stays in the function. foo() will not change the value of a outside of foo()."),
         answer("10"),
         answer("Running a will return an error."),
         allow_retry = TRUE)
  )

Exercise 2

Put yourself in the shoes of a teacher: You've given your students 10 homework assignments and announced that you will drop their lowest homework score. Their final grade will be the average score of the remaining homeworks.

To make your life easier, you want to write an R function that will take a vector of 10 homework scores and return a final grade.

Create an object named x that contains the vector c(100, 100, 100, 100, 100, 100, 100, 100, 100, 90). *x will be the grades of your test student.


Exercise 3

Use sum(), min(), /, 9 and parentheses to calculate the final grade for student x.

x <- c(100, 100, 100, 100, 100, 100, 100, 100, 100, 90)
(sum(x) - min(x)) / ...

Exercise 4

Save the code below as a function named grade.

x <- c(100, 100, 100, 100, 100, 100, 100, 100, 100, 90)
(sum(x) - min(x)) / 9
grade <- function(){ 
  ...
}

Exercise 6

Each time you call grade() it computes the final grade of the vector x that contains c(100, 100, 100, 100, 100, 100, 100, 100, 100, 90). We'd like to use grade() with new vectors that have new values.

Make x a formal argument.

grade <- function(){ 
  ...
}
grade <- function(...){ 
  (sum(x) - min(x)) / 9
}

Exercise 7

Calculate the final grade of the vector c(100, 90, 90, 90, 90, 90, 90, 90, 90, 80).

grade <- function(x){ 
  (sum(x) - min(x)) / 9
}

Use the provided vector!

Using runif() and rnorm()

runif() and `rnorm() generate a random uniform and normal distribution, respectively.

Exercise 1

Write a custom function named add_rvs() that adds together 1 random variable generated by runif() and 1 random variable generated by rnorm(). Use n=1 and the default values for the other arguments.


add_rvs <- function(){
  runif(1) + rnorm(...)
}

Exercise 2

What if you wanted to customize n? Add a formal argument n to your add_rvs() function with a default value of 1.


add_rvs <- function(n = 1){
  runif(n) + rnorm(...)
}

Exercise 3

Lets say we want to call add_rvs() twenty times. Call add_rvs() twenty times using replicate(). Use the default value of the n argument in add_rvs().

add_rvs <- function(n = 1){
  runif(n) + rnorm(n)
}

replicate(20, ...)
Remember to use the parentheses when calling a function from inside replicate()!

Testing

Testing is very useful when you are going to run a certain function a lot.

Exercise 1

We use the return() function whenever we want to return a value that is not the last line of the code body.

impatient_square <- function(x){
  return(x)
  x^2
}

Exercise 2

This function is called angry(). Add a line of code that throws the message "Wrong! I am angry!" using stop().

angry <- function(x){
  x
}
angry <- function(x){
  ...("Wrong! I am angry!")
  x
}

Exercise 3

Now make it so that angry() only yells at us if the x input is not a character. Also change the error message to something more useful: "x must be a character." This method is also known as the "if then stop".

angry <- function(x){
  x
}
angry <- function(x){
  if(!is.character(x)){
    ...("x must be a character.")
  }
  x
}

Exercise 4

What if we wanted to incorporate testthat? Use the showfailure() function to see what happens when we plug in x = 1 to angry().

library(testthat)
angry <- function(x){
  if(!is.character(x)){
    stop("x must be a character.")
  }
  x
}
show_failure(...)
show_failure(angry(x=...))

Handling NAs

Exercise 1

question("Which describes `if`'s behavior? (Check all that apply).",
         answer("`if` takes a logical test and a piece of code. It runs the code _if_ the test returns TRUE.", correct = TRUE, message = "`if` is a way to run code only in certain cases. When you use `if`, you first pass it a logical test surrounded by parentheses and then a piece of code."),
         answer("`if` takes a logical test and a piece of code. It does not run the code _if_ the logical test returns FALSE.", correct = TRUE, message = "`if` will run the piece of code if the logical test returns TRUE and not run the code if the logical test returns FALSE."),
         answer("`if` returns the results of the code that appears in _parentheses_ behind `if`."),
         answer("`if` always executes all of the code that follows it.", message = "`if` will always execute the logical test that appears in parentheses behind `if`. However, `if` will only execute the code that appears after the logical test if the logical test returns TRUE."),
         allow_retry = TRUE)

Exercise 2

Many data sets use their own symbols to represent missing values. For example, NOAA will often use -99 to represent missing values in weather data sets. Let's write a function that checks whether a value is -99, and if so replaces the value with NA, like this:

clean <- function(x){
  if(x == -99) x <- NA
  x
}
clean(1)
clean(-99)

Exercise 3

Add an if statement to the beginning of clean(). Your statement should assign NA to x if x equals -99.

x <- -99
clean <- function(x){
  # add if statement here
  x
}
clean <- function(x){
  if(x == -99) x <- NA
  x
}

Using stopifnot()

Exercise 1

quiz(
  question("How is stopifnot() different from if` + stop()?",
           allow_retry = TRUE,
           random_answer_order = TRUE,
           answer("Instead of checking whether a condition is met, `stopifnot()` checks whether a condition is not met", correct = TRUE),
           answer("stopifnot() always passes along a custom error message"),
           answer("stopifnot() cannot take logical arguments")))

Exercise 2

Rewrite the below function using case_when().

foo <- function(x){
  if(x > 2) "a"
  else if(x < 2) "b"
  else if(x == 1) "c"
  else "d"
}
foo2 <- function(x){
  case_when(

    # Insert code here!

  )
}
foo2 <- function(x){
  case_when(
    x > 2  ~ "a",
    x < 2  ~ "b",
    ...
  )
}
foo2 <- function(x){
  case_when(
    x > 2  ~ "a",
    x < 2  ~ "b",
    x == 1 ~ "c",
    TRUE ~ ...
  )
}

Exercise 3

foo <- function(x){
  if(x > 2) "a"
  else if(x < 2) "b"
  else if(x == 1) "c"
  else "d"
}
foo(1)
quiz(
  question("What will this code return?",
         answer('"a"', message = "R will not return a because the condition 1 > 2 is false."),
         answer('"b"', correct = TRUE, message = 'The condition 1 < 2 is true, so R will evaluate the code that follows it (i.e. "b") and then skip the remainder of the multi-part if statement.'),
         answer('"a" "b"', message = "The conditions 1 < 2 and 1 == 1 are both TRUE, but R will stop after the _first_ true condition."),
         allow_retry = TRUE))

Exercise 4

clean <- function(x){
  if(x == -99) NA 
  if(x == ".") NA
  if(x == "NaN") NA
  x
}
clean(-99)
quiz(
  question("What will this code return?",
         answer("NA", message = "Did you notice that the `if` statements are not linked by `else`? What difference does this make?"),
         answer("-99", correct = TRUE, message = "Since the `if` clauses are not linked by `else`, R treats them as separate statements. R checks each if statement. After the first statement it runs NA, but does not return this as the result of the function (because this is not the final statement in the function). R does not run NA for the next two if statements because their conditions are false. Then R reaches `x`, which is the last line and last statement in the function. R evaluates `x`, which equals -99, and returns -99 as the result of the function."),
         allow_retry = TRUE))

Practice: Monthly Temperatures

Exercise 1

Using the data set airquality, create a list-column of monthly temperatures called monthly_temp, grouping by Month.


airquality %>%
   group_by(Month) %>%
   mutate(monthly_temp = list(...))

Exercise 2

quiz(question("Which of the following possibilities for filling in the blank is syntactically correct and converts from Fahrenheit to Celsius?",
           answer("(. - 32) * 5/9)"),
           answer("(- 32 * 5/9)"),
           answer("~ (. - 32) * 5/9)", correct = TRUE),
           answer("~ (fahrenheit - 32) * 5/9)"))
)

Exercise 3

Use a map_* function to convert all of the temperatures in monthly_temp from Fahrenheit to Celsius, creating a new variable temp_celsius. Hint: Use the formula from the quiz above!

airquality %>%
  group_by(Month) %>%
  mutate(monthly_temp = list(Temp)) %>%
  mutate(...)
airquality %>%
  group_by(Month) %>%
  mutate(monthly_temp = list(Temp)) %>%
  mutate(temp_celsius = map(monthly_temp, ...))

Exercise 4

Oops! The scientists made a mistake in which all of the temperatures were mis-recorded. For temperatures recorded at or below 20 degrees Celsius, the true temperature is actually 1 degree Celsius lower. For temperatures recorded as higher than 20 degrees Celsius, the true temperature is actually 3 degrees Celsius higher.

Write an anonymous function that rectifies the scientists' mistake by using case_when().

function(x) 
  case_when(...)
function(x) 
  case_when(x <= 20 ~ x - 1,
            x > 20 ~ x + 3)

Exercise 5

Now use this function along with map(), swapping out the function code after the ~ in the map() argument.

airquality %>%
  group_by(Month) %>%
  mutate(monthly_temp = list(Temp)) %>%
  mutate(temp_celsius = map(monthly_temp, ~ (. - 32)* 5/9)) %>%
  mutate(temp_celsius = map(temp_celsius, ~ ...))
airquality %>%
  group_by(Month) %>%
  mutate(monthly_temp = list(Temp)) %>%
  mutate(temp_celsius = map(monthly_temp, ~ (. - 32)* 5/9)) %>%
  mutate(temp_celsius = map(temp_celsius, ~ case_when(...)))

Coin Flipping exercise

Let's create a function, coin_flip(), that takes an input n of the number of times to flip a coin and adds up the number of Tails.

Exercise 1

Let's start by creating a minimally viable function called starter_coin() that flips one coin once and prints out H or T. Remember: You can paste vectors as an argument into sample()!


starter_coin <- function() sample(c("H", "T"), ...)

Exercise 2

Now let's take it up a notch and add in a formal argument n that specifies the number of times to flip the coin.


starter_coin <- function(n) sample(c("H", "T"), n, replace = ...)

Exercise 3

Add in a sensible stopifnot() to your function that checks whether the input n is numeric.


starter_coin <- function(n){
  stopifnot(is.numeric(...))
  sample(c("H", "T"), n, replace = TRUE)
}

Exercise 4

Great! Now we should be able to create coin_flip(), which counts the number of Tails for n flips. Set the default value of n to 1.

coin_flip <- function(n = 1){
  ...
}
Use the code from starter_coin() and sum().
coin_flip <- function(n = 1){
  stopifnot(is.numeric(n))
  sum(sample(c("H", "T"), n, replace = TRUE) == ...)
}

Exercise 5

Now let's create a function called five_flips that counts how many Tails occur in five coin flips, or the equivalent of coin_flip(5), but calls it n times. Set the default value of n separate flips to 1. Use a map_* function to apply the rep() function to coin_flip() n times.

five_flips <- function(n = 1){
}
The map function you are looking for is map_int().
rep(x,n) means that you are flipping x coins n times.
five_flips <- function(n = 1){
  stopifnot(is.numeric(n))
  map_int(rep(5, n), ...)
}

Exercise 6

Create a tibble named x with one variable: flips. flips is a list column of 10 observations, each element of which is result of flipping 5 coins 10 times. Make sure to use the correct map_* function.

x <- tibble(flips = ...)
x <- tibble(flips = map(rep(10, 10), ...))

Random variables

Exercise 1

Let's create a function biggest(), which takes the larger of two random variables. Use n = 1 for both random variables.

biggest <- function(){
  ...
}
biggest <- function(){
  max(runif(...), rnorm(...))
}

Exercise 2

Modify biggest() so that it prints out "Uniform" if the larger variable is the uniformally distributed random variable, and "Normal" if otherwise. Use case_when() with TRUE ~ "Tie". Remember that runif() and rnorm() generate random values every time they are called...

biggest <- function(){
  ...
}
biggest <- function(){
  x <- runif(1)
  y <- rnorm(1)
  case_when(...)
}
biggest <- function(){
  x <- runif(1)
  y <- rnorm(1)
  case_when(x > y ~ "Uniform",
           y > x ~ "Normal",
           ...)
}

Exercise 3

We'd like for biggest() to return a vector of two elements: The name of the distribution of the larger random variable and its value. If the two values are identical, we'd like for the function to return "Tie" and the value of one of the variables (since they are identical).

biggest <- function(){
  x <- runif(1)
  y <- rnorm(1)
  case_when(x > y ~ ...,
            y > x ~ ...,
            TRUE ~ ...)
}
biggest <- function(){
  x <- runif(1)
  y <- rnorm(1)
  case_when(x > y ~ c("Uniform", x),
           y > x ~ c("Normal", y),
           TRUE ~ c("Tie", ...))
}

Exercise 4

Let's create a new function named repeat_biggest() that does so by calling biggest() n number of times. * Use replicate() to call the function multiple times.

repeat_biggest <- function(n){
  replicate(...)
}
repeat_biggest <- function(n = 1){
  replicate(n, biggest())
}

The Crooked Casino

Recall the function we wrote in the chapter to model a crooked game of craps. crooked_craps() is expected to return a winning 7 or 11 only half of the time the sum is actually 7 or 11. It returns a 2 much more often than it should. Code:

crooked_dice <- function(n = 1){
  stopifnot(is.numeric(n))
  stopifnot(n >= 0)
  roll <- sum(sample(1:6, n, replace = TRUE))
  ifelse((roll == 7 | roll == 11) && runif(1) >= 0.5, 2, roll)
}

crooked_craps <- function(n = 1){
  stopifnot(is.numeric(n))
  map_dbl(rep(2, n), crooked_dice)
}

Let's put our visualization skills to the test.

Exercise 1

Create a tibble with column "games" and run crooked_craps() 100 times.


tibble(games = crooked_craps(...))

Exercise 2

Create a barplot with games on the x axis.

tibble(games = crooked_craps(100))

Exercise 3

Now, change the x axis scale to increment by integers from 2 to 12.

tibble(games = crooked_craps(100)) %>%
  ggplot(aes(x = games)) +
  geom_bar()
tibble(games = crooked_craps(100)) %>%
  ggplot(aes(x = games)) +
  geom_bar() %>%
  scale_x_continuous(breaks = seq(...))

Exercise 4

Use an appropriate map_* function and rep() to call crooked_craps(100) 10 times. Create a tibble named x with a column named games.


x <- tibble(games = map(rep(100, 10), ...))

Exercise 5

Inside each of these iterations of 100 rolls, we want to count the number of 3's and 11's. Technically, with a fair die one would expect as many 11's to appear as 3's. 11's can be obtained in two different ways: by adding 5 + 6 and 6 + 5. The same goes for 3's: by adding 1 + 2 and 2 + 1.

First, count the number of 3's. You can do so by using a the appropriate map_* function and an anonymous function defined as sum(. == 3).

x <- tibble(games = map(rep(100, 10), crooked_craps)) %>%
  mutate(counts = ...)
x <- tibble(games = map(rep(100, 10), crooked_craps)) %>%
  mutate(counts = map_int(games, ~(...)))

Exercise 6

Now, modify counts so that it is a list-column of 2 elements per list: The number of 3's, then the number of 11's.

x <- tibble(games = map(rep(100, 10), crooked_craps)) %>%
  mutate(counts = map_int(games, ~sum(. == 3)))
x <- tibble(games = map(rep(100, 10), crooked_craps)) %>%
  mutate(counts = map(games, ~list(sum(. == 3), ...)))

Submit

Some of these exercises were taken from the collection of RStudio Primers, a great resource for practicing your skills.

submission_ui
submission_server()


davidkane9/PPBDS.data documentation built on Nov. 18, 2020, 1:17 p.m.