For this chapter, you will need the following R Packages:
library(dplyr)
In ArcGIS, joining two datasets on a common attribute is done with tools from the "Join" Toolset. Similar as described with the Selection tools in chapter \@ref(chapter-selection), a join has a temporary / floating nature and does not automatically produce an output. In R
, joining two datasets is only persistent if the output is assigned to a new variable^[or piped into a new function].
In R we have two main functions for joining two datasets. On the one hand there is the base R function merge
, on the other hand there is the *_join
family that lie within the dplyr
package. Since the latter family of functions are very close to how Joins are done in SQL, we will use the latter case for our examples below.
Before we begin with our examples, we have to make clear the differences among the various forms of join operations.
knitr::include_graphics("images/joins.png")
Inner Join
in R is the most common type of join. It is an operation that returns the rows when the matching condition is fulfilled. Below we demonstrate it with an example.
df1 <- data.frame(TeamID = c(1,4,6,11), TeamName = c("new york knicks","los angeles lakers","milwaukee bucks","boston celtics"), Championships = c(2,17,1,17)) df2 <- data.frame(TeamID = c(1,2,11,8), TeamName = c("new york knicks","philadelphia 76ers","boston celtics","los angeles clippers"), Championships = c(2,3,17,0)) df1 df2
df1 %>% inner_join(df2)
Outer join
in R using simply returns all rows from both data frames. This is very well depicted in figure \@ref(fig:joins).
full_join(df1,df2)
The left join
in R returns all records from the data frame on the left, as well as and the matched records from the one at the right.
left_join(df1,df2)
Similarly works also the right join
.
right_join(df1,df2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.