In Athanasiamo/tidyquintro: Quick Intro to Tidyverse

library(tidyquintro)
library(learnr)
library(gradethis)

knitr::opts_chunk$set(echo = FALSE,
                 exercise.warn_invisible = FALSE)

# enable code checking
tutorial_options(exercise.checker = grade_learnr)

Summarising the whole dataset

Summarising takes some practise to get right. So it's best to just give it a go!

First start by trying to summarise a single column, bill_length_mm by calculating its mean.

penguins |> 
  summarise(_(_, na.rm = _))

penguins |> 
  summarise(mean(bill_length_mm, na.rm = TRUE))

grade_code(
  correct = random_praise(),
  incorrect = random_encouragement()
)

Did you remember to place the function first, then the colum name inside the function?

Summarise two columns

Often, we'd like to summarise several columns at once. Get the mean for both bill_depth_mm and bill_length_mm by summarising each.

penguins |> 
  summarise(bill_length_mm = mean(__, na.rm = _),
            bill_depth_mm = mean(__, na.rm = _))

penguins |> 
  summarise(bill_length_mm = mean(bill_length_mm, na.rm = TRUE),
            bill_depth_mm = mean(bill_depth_mm, na.rm = TRUE))

grade_code(
  correct = random_praise(),
  incorrect = random_encouragement()
)

Make sure the correct column names go to the correct summary!

Summarise across many columns

Even more often, we'd like to summarise a collection of columns. In the tidyverse we do this with the across function, summarising across multiple columns at once using tidy-selectors. Get the mean of all the columns starting with "bill"

penguins |> 
  summarise(across(__, .fns = mean, na.rm = TRUE)

penguins |> 
  summarise(across(starts_with("bill"), .fns = mean, na.rm = TRUE))

grade_code(
  correct = random_praise(),
  incorrect = random_encouragement()
)

Remember to use the tidy selectors like ends_with, contains, and starts_with

the expectation here is to use tidy-selector starts_with

Summarise across many columns with several functions

penguins |> 
  summarise(across(__, .fns = list(mean = mean,
                                   _ = _,
                                   _ = _,
                                   _ = _), 
                     na.rm = TRUE)
  )

penguins |> 
  summarise(across(starts_with("bill"), 
                   .fns = list(mean = mean,
                               sd = sd,
                               min = min,
                               max = max), 
                   na.rm = TRUE)
  )

grade_code(
  correct = random_praise(),
  incorrect = random_encouragement()
)

The expectation here is to name the output with the exact same name as the function

be sure to use all small letters here

Summarising grouped data

Tidyverse summaries become even more powerful when paired with grouped data. These groupings make it possible to aggregate data given the groups, or get summaries across meaningful groups in the data.

Start out slow, by grouping the data by species and getting the mean of the bill_length_mm column

penguins |> 
  group_by() |> 
  summarise(_(_, na.rm = _))

penguins |> 
  group_by(species) |> 
  summarise(mean(bill_length_mm, na.rm = TRUE))

grade_code(
  correct = random_praise(),
  incorrect = random_encouragement()
)

Did you remember to place the function first, then the colum name inside the function?

Summarise two columns

maybe the islands play a larger role? Group the data by island instead, and take the summary of two columns

penguins |> 
  group_by(_) |> 
  summarise(bill_length_mm = mean(__, na.rm = _),
            bill_depth_mm = mean(__, na.rm = _))

penguins |> 
  group_by(island) |> 
  summarise(bill_length_mm = mean(bill_length_mm, na.rm = TRUE),
            bill_depth_mm = mean(bill_depth_mm, na.rm = TRUE))

grade_code(
  correct = random_praise(),
  incorrect = random_encouragement()
)

Make sure the correct column names go to the correct summary!

Summarise across many columns

Acutally, I'm convinced that both species and island make meaningful groups here. Group the data by both, and grab the mean of all bill measurements

penguins |> 
  group_by(_) |> 
  summarise(across(__, .fns = mean, na.rm = TRUE)

penguins |> 
  group_by(species, island) |> 
  summarise(across(starts_with("bill"), .fns = mean, na.rm = TRUE))

grade_code(
  correct = random_praise(),
  incorrect = random_encouragement()
)

Remember to use the tidy selectors like ends_with, contains, and starts_with

the expectation here is to use tidy-selector starts_with

Summarise across many columns with several functions

penguins |> 
  summarise(across(__, .fns = list(mean = mean,
                                   _ = _,
                                   _ = _,
                                   _ = _), 
                     na.rm = TRUE)
  )

penguins |> 
  summarise(across(starts_with("bill"), 
                   .fns = list(mean = mean,
                               sd = sd,
                               min = min,
                               max = max), 
                   na.rm = TRUE)
  )

grade_code(
  correct = random_praise(),
  incorrect = random_encouragement()
)

The expectation here is to name the output with the exact same name as the function

be sure to use all small letters here

Play around

The best way to get a feeling for how things work is to just play around with it. Adapt the code below and just try different things. See what happens, look at the possible errors etc.

penguins |> 
  group_by(_) |> 
  summarise(across(__, 
                     .fns = list(), 
                     na.rm = TRUE)
  )

Athanasiamo/tidyquintro documentation built on Oct. 11, 2022, 7:15 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Athanasiamo/tidyquintro
Quick Intro to Tidyverse

In Athanasiamo/tidyquintro: Quick Intro to Tidyverse

Summarising the whole dataset

Summarise two columns

Summarise across many columns

Summarise across many columns with several functions

Summarising grouped data

Summarise two columns

Summarise across many columns

Summarise across many columns with several functions

Play around

R Package Documentation

Browse R Packages

We want your feedback!

Athanasiamo/tidyquintro Quick Intro to Tidyverse

In Athanasiamo/tidyquintro: Quick Intro to Tidyverse

Summarising the whole dataset

Summarise two columns

Summarise across many columns

Summarise across many columns with several functions

Summarising grouped data

Summarise two columns

Summarise across many columns

Summarise across many columns with several functions

Play around

R Package Documentation

Browse R Packages

We want your feedback!

Athanasiamo/tidyquintro
Quick Intro to Tidyverse