knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE) library(learnr) library(tidyverse) library(gapminder) tutorial_options(exercise.timelimit = 60, exercise.blanks = "___+", exercise.eval=T)
Variables, data types in R
Functions (using, getting help, making your own)
Loading and saving data
Using Rstudio, Projects, and Markdown notebooks
Working with data in vectors, lists, matrices, and tables
'Doing stuff to data'
Much of data analysis can be viewed as taking some datasets (the 'nouns') and applying a series of transformations (the 'verbs')
The nouns are typically 'tibbles' (could be matrices, vectors or lists).
The verbs are functions
Today we'll learn about the key 'verbs' for wrangling tables
We often want to apply multiple functions ('verbs') to our data in a chain.
%>%
is a special R symbol for chaining together functions (part of tidyverse
)
#Very hard to read bop(scoop(hop(foo_foo, through = forest), up = field_mice), on = head) #creating unnecessary 'temporary' variables foo_foo_1 <- hop(foo_foo, through = forest) foo_foo_2 <- scoop(foo_foo_1, up = field_mice) foo_foo_3 <- bop(foo_foo_2, on = head)
Using pipes makes your code easy to read and understand as a series of verbs
foo_foo %>% hop(through = forest) %>% scoop(up = field_mice) %>% bop(on = head)
Assigning the results of a chain to a new variable
result <- input_data %>% function_1 %>% function_2
This also works
input_data %>% function_1 %>% function_2 -> result
The pipe feeds the first argument of the next function
x <- c('a', 'b', 'c') x %>% c('d') #same as c(x, 'd')
If you want the piped input to feed a different argument, you can use .
:
x %>% c('d', .) #same as c('d', x)
Code is MUCH easier to read (and modify)
%>%
= read as 'and then'
This style of coding is less prone to errors
Using pipes is a choice! Use it when it's helpful
Note: Rstudio keyboard shortcut: Cmd + shift + M
dplyr package
filter
select
arrange
mutate
summarise
group-by
library(tidyverse)
#install.packages('gapminder') library(gapminder)
head(gapminder)
Select a subset of the rows from a tibble
Arguments are the 'filters' you'd like to apply
gapminder %>% filter(year == 2007)
==
to pick rows with variable equal to a specified value.,
to check for multiple filters being true ('AND')gapminder %>% filter(year == 2002, continent == "Asia") %>% sample_n(4)
|
to check for any in multiple filters being true ('OR')gapminder %>% filter(year == 2002 | continent == "Asia") %>% sample_n(4)
%in%
to check if value is contained in a specified setgapminder %>% filter(country %in% c("Argentina", "Belgium", "Mexico"), year %in% c(1987, 1992))
select
to pick a subset of columns by namegapminder %>% select(country, year, lifeExp) %>% head(4)
Select columns with 'improper' names using back-ticks (NOT single quotes):
Tab complete column names will do this for you
df %>% select(`1999`, `badly named variable`)
rename
to rename certain columnsgapminder %>% rename(lifeExpectancy = lifeExp, population = pop) %>% head(3)
gapminder %>% arrange(year) %>% head(4)
gapminder %>% arrange(year, lifeExp) %>% head(4)
gapminder %>% filter(year > 2000) %>% arrange(desc(country)) %>% head(4)
gapminder %>% mutate(just_one = 1) %>% head(4)
gapminder %>% mutate(gdp = pop * gdpPercap) %>% head(4)
gapminder %>% mutate(pop = pop/1e6) %>% head(4)
x <- 10 ifelse(x > 9, "x is greater than 9", "x is not greater than 9")
Allows you to use mutate in a 'condition-dependent' way
gapminder %>% mutate(adjusted_gdp = ifelse(year < 1980, gdpPercap * 2, gdpPercap)) %>% sample_n(5)
gapminder %>% filter(year == 1997) %>% summarize(max_exp = max(lifeExp), sd_exp = sd(lifeExp))
summarise
, group_by
allows you to summarise data for each possible value of a categorical variablegapminder %>% filter(year == 1997) %>% group_by(continent) %>% summarize(max_exp = max(lifeExp), sd_exp = sd(lifeExp))
gapminder %>% group_by(continent, year) %>% summarize(num_rows = n(), max_exp = max(lifeExp), sd_exp = sd(lifeExp)) %>% head(4)
n()
function counts number of rows in each group(google 'dplyr cheat sheet') Cheat sheet
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.