library(tidyquintro)
library(learnr)
library(gradethis)

knitr::opts_chunk$set(echo = FALSE,
                 exercise.warn_invisible = FALSE)

# enable code checking
tutorial_options(exercise.checker = grade_learnr)

Add variable to the data

Create a column named bill_ld_ratio that is the value of bill_length_mm divided by bill_depth_mm

penguins |> 
  mutate(_ = _ / _) |> 
  select(species, island, contains("bill"))
penguins |> 
  mutate(bill_ld_ratio = bill_length_mm / bill_depth_mm) |> 
  select(species, island, contains("bill"))
grade_code(
  correct = random_praise(),
  incorrect = random_encouragement()
)
This exercise expects piped data into the mutate function
Make sure you have given the new column the correct name

Add variable based on data logic

some times, we want to assign certain data values based on other variables in the data set. For instance, maybe we want to classify all penguins with body mass above 4.5 kg as "large" while the rest are "normal"?

The ifelse function takes expressions much like filter. The first value after the expression is the value assigned if the expression is TRUE, while the second is if the expression is FALSE

Adapt the code below to evaluate if body mass is above 4.5kg, and assign rows to either "large" or "normal"

penguins |> 
  mutate(body_type = ifelse(body_mass_g _ 4500, "large", "normal")) |> 
  select(species, island, contains("body"))
penguins |> 
  mutate(body_type = ifelse(body_mass_g > 4500, "large", "normal")) |> 
  select(species, island, contains("body"))
grade_code(
  correct = random_praise(),
  incorrect = random_encouragement()
)
Have you used the correct sign for 'larger than'?

Add variable based on data logic 2

Many times, we want to do the same as above, but with more than two options. We can then use case_when from dplyr. This function is similar to ifelse, but where you specify what each condition should be assigned. On the left you have the logical expression, and the on the right of the tilde (~) is the value to be assigned if that expression is TRUE

Adapt the below code so that penguins with body mass below 3 kg are "petite"

penguins |> 
  mutate(
    body_type = case_when(
      body_mass_g _ 4500 ~ "large",
      body_mass_g _ 3000 ~ "petite",
      TRUE ~ "normal") # the rest
  ) |> 
  select(species, island, contains("body"))
penguins |> 
  mutate(
    body_type = case_when(
      body_mass_g > 4500 ~ "large",
      body_mass_g < 3000 ~ "petite",
      TRUE ~ "normal") # the rest
  ) |> 
  select(species, island, contains("body"))
grade_code(
  correct = random_praise(),
  incorrect = random_encouragement()
)
Have you used the correct sign for 'larger than'?

Adding variables based on grouped data

Some times, it makes sense to calculate values based on some grouping variable. In this case, for instance species, island or sex. In other cases it might be other variables, like subject (for longitudinal designs) or treatment groups.

When data is grouped by one or more columns in the data, one can apply calculations based on summary measures for the groups on each individual score. This is powerful when you want to calculate which percentile a scores falls in, or other relational measures (like time since baseline).

Adapt the code below, so that you get what percentile a penguins' bill_length is based on the species maximum.

penguins |> 
  group_by(_) |> 
  mutate(
    bill_length_sp_max = max(__, na.rm = TRUE),
    bill_length_pc = (bill_length_mm/__)*100
  ) |> 
  select(species, island, contains("bill"))
penguins |> 
  group_by(species) |> 
  mutate(
    bill_length_sp_max = max(bill_length_mm, na.rm = TRUE),
    bill_length_pc = (bill_length_mm/bill_length_sp_max)*100
  ) |> 
  select(species, island, contains("bill"))
grade_code(
  correct = random_praise(),
  incorrect = random_encouragement()
)
Did you make sure the column names are correct?

Calculations based on groups

It is possible, that the Islands have some impact on the penguins' size. Perhaps one island has more food available or less predators, so the penguins become larger.

Based on the code in the previous example, adapt it to be grouped by island in stead of species.

# Copy the code from the previous example, or type it out.
penguins |> 
  group_by(island) |> 
  mutate(
    bill_length_sp_max = max(bill_length_mm, na.rm = TRUE),
    bill_length_pc = (bill_length_mm/bill_length_sp_max)*100
  ) |> 
  select(species, island, contains("bill"))
grade_code(
  correct = random_praise(),
  incorrect = random_encouragement()
)
Did you make sure the column names are correct?


Athanasiamo/tidyquintro documentation built on Oct. 11, 2022, 7:15 p.m.