library(tidyquintro) library(learnr) library(gradethis) knitr::opts_chunk$set(echo = FALSE, exercise.warn_invisible = FALSE) # enable code checking tutorial_options(exercise.checker = grade_learnr)
Let us start with some exercises in filtering, i.e. subsetting rows.
Fill in the code below so that you subset the data by the species
column, so you only have the gentoo's in your output.
filter(penguins, __ == "Gentoo")
filter(penguins, species == "Gentoo")
grade_code( correct = random_praise(), incorrect = random_encouragement() )
the column names is 'species'
When evaluating something as TRUE or FALSE, remember to use '==' and not '='
When we are subsetting based on numerical columns, we can use arithmetic evaluations. Complete the code below so you are left with only data where the flipper length is larger that 180.
filter(penguins, flipper_length_mm _ 180)
filter(penguins, flipper_length_mm > 180)
grade_code( correct = random_praise(), incorrect = random_encouragement() )
arithmetic evaluations can be done with '==', '>', '<'
The above code will not include any row where flipper length is exactly 180. For this to happen you have to indicate that it can be larger or equal to 180.
filter(penguins, flipper_length_mm >_ 180)
filter(penguins, flipper_length_mm >= 180)
grade_code( correct = random_praise(), incorrect = random_encouragement() )
arithmetic evaluations can also be done with '>=' (larger than) and '<=' (smaller than)
We can add several conditions when we are evaluating. Using a comma (','), each expression must be TRUE for the end result. Here, choose all data where flipper length is larger or equal to 180, and the species is "Gentoo"
filter(penguins, flipper_length_mm __ 180, ____ == "Gentoo")
filter(penguins, flipper_length_mm >= 180, species == "Gentoo")
grade_code( correct = random_praise(), incorrect = random_encouragement() )
make sure each expressions works individually, if you are not succeeding
separate the different expressions with a comma
Subsetting columns is a great way to reduce karge datasets to more manageable sizes.
Using the select()
function from dplyr, select the first, second, fourth and sixth column from the penguins dataset
using their numerical values.
select(penguins, _, _, _, _)
select(penguins, 1, 2, 4, 6)
grade_code( correct = random_praise(), incorrect = random_encouragement() )
Each column number should be separated by a comma
Sometimes we want to subset whole ranges, and maybe a couple of extra columns. We can do this usind the colon. Complete the code below so you select columns 1 through 4, and also column 6.
select(penguins, _:_, _)
select(penguins, 1:4, 6)
grade_code( correct = random_praise(), incorrect = random_encouragement() )
While using numbers for the columns can be convenient, in most cases you'll likely want to base your selection on the names of column. The syntax you learned above works exactly the same for column names. Take the same code as before, but this time instead of using the index numbers for the column, use the column names.
Column 1 is species
, column 4 is bill_depth_mm
, and column 6 is body_mass_g
select(penguins, _:_, _)
select(penguins, species:bill_depth_mm, body_mass_g)
grade_code( correct = random_praise(), incorrect = random_encouragement() )
Other times, is might be handy to grab columns based on their naming. If you are lucky, your dataset has some overarching naming convention, that makes it possible to grab columns based on their names.
Complete the code below so that you are selecting species, island and all the columns starting with "bill".
select(penguins, _, _, starts_with("_"))
select(penguins, species, island, starts_with("bill"))
grade_code( correct = random_praise(), incorrect = random_encouragement() )
Now we lost flipper length! to make sure we keep flipper length, instead select columns what end with "mm".
select(penguins, _, _, ends_with("_"))
select(penguins, species, island, ends_with("mm"))
grade_code( correct = random_praise(), incorrect = random_encouragement() )
We should get a better idea of what columns in our data are coded as what. Particularly factors, what columns are factors in this data set?
Complete the code to select only columns that are factors.
select(penguins, where(is._))
select(penguins, where(is.factor))
grade_code( correct = random_praise(), incorrect = random_encouragement() )
the function to checking if a vector is a function is `is.vector`
quiz( question("What functions can you use to subset a data set by rows?", answer("dplyr's `filter()`", correct = TRUE), answer("dplyr's `select()`"), answer("`subset()`", correct = TRUE), allow_retry = TRUE ), question("What functions can you use to subset a data set by columns", answer("dplyr's `filter()`"), answer("dplyr's `select()`", correct = TRUE), answer("`subset()`", correct = TRUE), allow_retry = TRUE ), question("If you want to select all columns in data 'df' that contains the string 'something', you can do that by", answer("`df[grepl('something', names(df))]`", correct = TRUE), answer("`select(df, starts_with('something')`"), answer("`df[,'something']`"), answer("`select(df, contains('something')`", correct = TRUE), allow_retry = TRUE ), question("If you want to subset rows so that you only have those below 18 years of age, you can do that by", answer("`df$age < 18`"), answer("`filter(df, age < 18)`", correct = TRUE), answer("`df[df$age < 18,]`", correct = TRUE), answer("`filter(df, age <= 18)`"), allow_retry = TRUE ) )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.