library(tidyquintro) library(learnr) library(gradethis) knitr::opts_chunk$set(echo = FALSE, exercise.warn_invisible = FALSE) # enable code checking tutorial_options(exercise.checker = grade_learnr)
Summarising takes some practise to get right. So it's best to just give it a go!
First start by trying to summarise a single column, bill_length_mm
by calculating its mean.
penguins |> summarise(_(_, na.rm = _))
penguins |> summarise(mean(bill_length_mm, na.rm = TRUE))
grade_code( correct = random_praise(), incorrect = random_encouragement() )
Did you remember to place the function first, then the colum name inside the function?
Often, we'd like to summarise several columns at once.
Get the mean for both bill_depth_mm
and bill_length_mm
by summarising each.
penguins |> summarise(bill_length_mm = mean(__, na.rm = _), bill_depth_mm = mean(__, na.rm = _))
penguins |> summarise(bill_length_mm = mean(bill_length_mm, na.rm = TRUE), bill_depth_mm = mean(bill_depth_mm, na.rm = TRUE))
grade_code( correct = random_praise(), incorrect = random_encouragement() )
Make sure the correct column names go to the correct summary!
Even more often, we'd like to summarise a collection of columns.
In the tidyverse we do this with the across
function, summarising across multiple columns at once using tidy-selectors.
Get the mean of all the columns starting with "bill"
penguins |> summarise(across(__, .fns = mean, na.rm = TRUE)
penguins |> summarise(across(starts_with("bill"), .fns = mean, na.rm = TRUE))
grade_code( correct = random_praise(), incorrect = random_encouragement() )
Remember to use the tidy selectors like ends_with, contains, and starts_with
the expectation here is to use tidy-selector starts_with
Even more often, we'd like to summarise a collection of columns.
In the tidyverse we do this with the across
function, summarising across multiple columns at once using tidy-selectors.
Get the descriptive statistics of all the columns starting with "bill" (mean, sd, min and max)
penguins |> summarise(across(__, .fns = list(mean = mean, _ = _, _ = _, _ = _), na.rm = TRUE) )
penguins |> summarise(across(starts_with("bill"), .fns = list(mean = mean, sd = sd, min = min, max = max), na.rm = TRUE) )
grade_code( correct = random_praise(), incorrect = random_encouragement() )
The expectation here is to name the output with the exact same name as the function
be sure to use all small letters here
Tidyverse summaries become even more powerful when paired with grouped data. These groupings make it possible to aggregate data given the groups, or get summaries across meaningful groups in the data.
Start out slow, by grouping the data by species and getting the mean of the bill_length_mm
column
penguins |> group_by() |> summarise(_(_, na.rm = _))
penguins |> group_by(species) |> summarise(mean(bill_length_mm, na.rm = TRUE))
grade_code( correct = random_praise(), incorrect = random_encouragement() )
Did you remember to place the function first, then the colum name inside the function?
maybe the islands play a larger role? Group the data by island instead, and take the summary of two columns
penguins |> group_by(_) |> summarise(bill_length_mm = mean(__, na.rm = _), bill_depth_mm = mean(__, na.rm = _))
penguins |> group_by(island) |> summarise(bill_length_mm = mean(bill_length_mm, na.rm = TRUE), bill_depth_mm = mean(bill_depth_mm, na.rm = TRUE))
grade_code( correct = random_praise(), incorrect = random_encouragement() )
Make sure the correct column names go to the correct summary!
Acutally, I'm convinced that both species and island make meaningful groups here. Group the data by both, and grab the mean of all bill measurements
penguins |> group_by(_) |> summarise(across(__, .fns = mean, na.rm = TRUE)
penguins |> group_by(species, island) |> summarise(across(starts_with("bill"), .fns = mean, na.rm = TRUE))
grade_code( correct = random_praise(), incorrect = random_encouragement() )
Remember to use the tidy selectors like ends_with, contains, and starts_with
the expectation here is to use tidy-selector starts_with
Even more often, we'd like to summarise a collection of columns.
In the tidyverse we do this with the across
function, summarising across multiple columns at once using tidy-selectors.
Get the descriptive statistics of all the columns starting with "bill" (mean, sd, min and max)
penguins |> summarise(across(__, .fns = list(mean = mean, _ = _, _ = _, _ = _), na.rm = TRUE) )
penguins |> summarise(across(starts_with("bill"), .fns = list(mean = mean, sd = sd, min = min, max = max), na.rm = TRUE) )
grade_code( correct = random_praise(), incorrect = random_encouragement() )
The expectation here is to name the output with the exact same name as the function
be sure to use all small letters here
The best way to get a feeling for how things work is to just play around with it. Adapt the code below and just try different things. See what happens, look at the possible errors etc.
penguins |> group_by(_) |> summarise(across(__, .fns = list(), na.rm = TRUE) )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.