Let's start by loading the tidyverse
package
library(tidyverse)
library(gridExtra) library(RColorBrewer)
Clear everything to make sure there's nothing leftover in our environment
rm(list = ls())
source("R/draw_table.R")
Next, we load three datasets of universities, cities, and states.
universities <- read_csv("https://raw.githubusercontent.com/altaf-ali/tidydata_tutorial/master/data/universities.csv") cities <- read_csv("https://raw.githubusercontent.com/altaf-ali/tidydata_tutorial/master/data/cities.csv") states <- read_csv("https://raw.githubusercontent.com/altaf-ali/tidydata_tutorial/master/data/states.csv")
Let's see how we can merge the universities
dataset with the cities
dataset.
grid.arrange(draw_table(universities, "universities"), draw_table(cities, "cities"), ncol = 2)
universities %>% left_join(cities, by = "city")
universities %>% left_join(cities, by = "city") %>% draw_table()
universities %>% right_join(cities, by = "city")
universities %>% right_join(cities, by = "city") %>% draw_table()
universities %>% inner_join(cities, by = "city")
universities %>% inner_join(cities, by = "city") %>% draw_table()
universities %>% full_join(cities, by = "city")
universities %>% full_join(cities, by = "city") %>% draw_table()
In the previous example both our datasets included a column named city
. But what if the names of the columns in the two datasets were not the same? For example, let's take a look at the states
table:
states %>% draw_table("states")
What if we were to merge the cities
dataset with states
?
grid.arrange(draw_table(cities, "cities"), draw_table(states, "states"), ncol = 2)
One option would be to rename the columns so their names would match, but you don't really need to do that. You can simply tell the join functions the mapping between the different names.
cities %>% left_join(states, by = c("state" = "statename"))
In the above example, we're telling left_join()
to merge using the state
column from the cities
data frame and statename
column from the states
data frame.
left_join(cities, states, by = c("state" = "statename")) %>% draw_table()
Load the following datasets:
r
presidents <- read_csv("https://raw.githubusercontent.com/altaf-ali/tidydata_tutorial/master/data/presidents.csv")
presidents_home <- read_csv("https://raw.githubusercontent.com/altaf-ali/tidydata_tutorial/master/data/presidents_home.csv")
The datasets include names of U.S. presidents:
r
presidents %>%
draw_table("presidents")
r
presidents_home %>%
draw_table("presidents_home")
Merge the two datasets so that it ONLY includes observations that exist in BOTH the datasets. There should be no missing values or NA
in the merged table. The results should match the following:
r
inner_join(presidents, presidents_home, by = c("First" = "GivenName", "Middle", "Last" = "Surname")) %>%
draw_table()
Merge the two datasets so that it includes ALL the observations from both the datasets. Some TookOffice, LeftOffice
and HomeState
values will be NA
and that's ok. The results should match the following:
r
full_join(presidents, presidents_home, by = c("First" = "GivenName", "Middle", "Last" = "Surname")) %>%
draw_table()
Merge the two datasets so that ALL observations from the presidents
datasets are included. Some HomeState
values will be NA
and that's ok. The results should match the following:
r
left_join(presidents, presidents_home, by = c("First" = "GivenName", "Middle", "Last" = "Surname")) %>%
draw_table()
Merge the two datasets so that ALL observations from the presidents_home
datasets are included. Some TookOffice
and LeftOffice
values will be NA
and that's ok. The results should match the following:
r
right_join(presidents, presidents_home, by = c("First" = "GivenName", "Middle", "Last" = "Surname")) %>%
draw_table()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.