knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) withr::local_options(joyn.verbose = FALSE) #withr::local_options(joyn.match_type = )
library(joyn)
π In joyn
, there are two major sets of tools to join data tables:
The primary function joyn()
Dplyr-like join functions: left_join()
, right_join()
, full_join()
, inner_join()
This vignette will explore the main function joyn()
. You can read about dplyr-joins in the "dplyr-joins" article instead.
library(joyn) library(data.table) x1 <- data.table(id = c(1L, 1L, 2L, 3L, NA_integer_), t = c(1L, 2L, 1L, 2L, NA_integer_), x = 11:15) y1 <- data.table(id = c(1,2, 4), y = c(11L, 15L, 16)) x2 <- data.table(id = c(1, 4, 2, 3, NA), t = c(1L, 2L, 1L, 2L, NA_integer_), x = c(16, 12, NA, NA, 15)) y2 <- data.table(id = c(1, 2, 5, 6, 3), yd = c(1, 2, 5, 6, 3), y = c(11L, 15L, 20L, 13L, 10L), x = c(16:20)) x3 <- data.table(id = c("c","b", "d", "d"), v = 8:11, foo = c(4,2, 7, 3)) y3 <- data.table(id = c("c","b", "c", "a"), y = c(11L, 15L, 18L, 20L)) x4 <- data.table(id1 = c(1, 1, 2, 3, 3), id2 = c(1, 1, 2, 3, 4), t = c(1L, 2L, 1L, 2L, NA_integer_), x = c(16, 12, NA, NA, 15)) y4 <- data.table(id = c(1, 2, 5, 6, 3), id2 = c(1, 1, 2, 3, 4), y = c(11L, 15L, 20L, 13L, 10L), x = c(16:20)) x5 <- data.table(id = c(1, 4, 2, 3, NA), t = c(1L, 2L, 1L, 2L, NA), country = c(16, 12, 3, NA, 15)) y5 <- data.table(id = c(1, 2, 2, 6, 3), gdp = c(11L, 15L, 20L, 13L, 10L), country = 16:20)
Let's suppose that you want to join the two tables x1
and y1
.
# Calling joyn() to join x1 and y1 joyn(x = x1, y = y1, match_type = "m:1" ) #Note RT: remove this argument once fixing the default value
The output table is the result of a full join -which is what joyn
always executes by the default. This means that the returning table will retains both matching and non matching rows from both x1
and y1
. Notice that the resulting table also contains an additional variable called .joyn
, which is the reporting variable. (Read below β¬οΈ)
A particular feature of joyn
is that it includes the reportvar
in the returning table, which informs you about the status of the join. You can modify both the name and the format of the reporting variable as follows:
Name: by default reportvar = ".joyn"
, but you can modify it with reportvar = "myname"
specifying the name you want to assign
Format: by default reporttype = "character"
, but you can also set it to numeric using reporttype = "numeric"
You can see the difference between the two types in the table below[^1]:
[^1]: For a better understanding of the meaning column check the "Advanced functionalities" article
knitr::kable( tibble::tribble( ~numeric, ~character, ~meaning, 1, "x", "Obs only available in x table", 2, "y", "Obs only available in y table", 3, "x & y", "Matching obs available in both tables", 4, "NA updated", "NAs in x updated with actual values in variables with same names in y", 5, "value updated", "Actual values and NAs in x updated with actual values in variables with same names in y", 6, "not updated", "Actual values and NAs in x are NOT updated with actual values in y" ), format = "simple", align = "lcl")
When performing a join, you might want to specify which variable(s) joyn
should join by.
While by default joyn
will consider the variable(s) in common between x
and y
as key(s) for the join, our suggestion is to make the keys explicit - i.e., specifying it/them in the by
argument
# Join with one variable in common joyn(x = x1, y = y1, by = "id", match_type = "m:1")
If you don't want to join by all variables in common between x
and y
, you can alternately use equivalency as an element of by
vector. This specification allows you to join on different variables between x
and y.
joyn(x = x4, y = y4, by = c("id1 = id", "id2"), match_type = "m:m")
Also, notice that joyn
will sort
the resulting table by key variables in by
. This is because sort = TRUE
by default.
π‘Match type refers to the relationship that exists between the observations of the joining tables. The possibility to perform joins based on the match type is one of the value added of using joyn
.
Following Stata's convention, we can have four different match types:
1:1 (one to one): the default[^2], the variables specified in by
variables uniquely identify single observations in both table β> each observation in left table has a unique match in the right table and viceversa
1:m (one to many): only left table is uniquely identified by by
variables β> each observation in by
var of the left table can have multiple matches in by
var of the right table
m:1 (many to one): only right table is uniquely identified by by
var -> each observation in left table can have only one match in the right table but observations in the right table might have multiple matches in the left table
m:m (many to many): variables in by
does not uniquely identify the observations in either table β> both tables can have multiple matches for each observation
[^2]: Notice that in the previous version default match type was "m:m" instead.
We recommend you always specify the match type when joining tables to ensure the output is correct.
# Many to one match type joyn(x = x1, y = y1, by = "id", match_type = "m:1") # Many to many match type joyn(x = x3, y = y3, by = "id", match_type = "m:m") # One to one match type - the default joyn(x = x2, y = y2, by = "id", match_type = "1:1") # Same join as: joyn(x = x2, y = y2, by = "id") # One to many match type joyn(x = x5, y = y5, by = "id", match_type = "1:m")
However, if are unsure/wrong about the relationships between the observations in your tables, joyn
will let you know that something is not right. Suppose you think your data is uniquely identified by variable id
, while it is not. By setting match_type = "1:1"
you will get and error, informing you that the match type is not as expected.
# Merging correctly but getting error because something is not right in the data joyn(x3, y3, by = "id", match_type = "1:1") # Merging wrongly but getting NO errors because you did not use match_type joyn(x3, y3, by = "id")
If instead you don't care about match types or you don't think it is necessary to use them for your particular needs, you might be fine without joyn
.
Join type determines which observations will be kept after the join. joyn()
allows you to choose which type of join to execute via the keep
argument.
This argument is called keep
rather than join_type
to avoid confusion with the argument match_type
, and in order to reflect that what you are specifying in the end is which observations you want to keep. This argument plays the role of allowing joyn()
to mimic the behavior of dplyr
's functions left_join
, right_join
, inner_join
, and full_join
, the default.
keep
can be of four types:
keep = "full"
: the default, which keeps all the observations in x and y, regardless of whether they match or not.
```r
joyn(x = x1, y = y1, match_type = "m:m")
```
keep = "left"
or keep = "master"
: keeps all observations in x
, both matching and non, and only those observations in y
that match in x
```r
joyn(x = x1, y = y1, keep = "left", match_type = "m:m")
```
keep = "right"
or keep = "using"
keeps all observations in y
, both matching and non, and only those observations in x
that match in y
```r
joyn(x = x1, y = y1, keep = "right", match_type = "m:m")
```
keep = "inner"
keeps only those observations that match in both tables.
```r
joyn(x1, y1, keep = "inner", match_type = "m:m")
```
Recall that joyn
is intended to be informative about the status and quality of the merging.
π JOYn report
By default, joyn
returns the JOYn report , i.e., a summary table of the merging. This includes the reporting variable, the number of rows that come from x
, the number of rows that come from y
and those that are common to both x
and y
. This info is also shown in percentage form in the percent column.
joyn(x = x3, y = y3, by = "id", match_type = "m:m", verbose = TRUE)
π Displaying messages
One of the value added of joyn
is that it produces a number of messages that are intended to inform you about the status of the join. The display of such messages is controlled by the argument verbose
, which allows you to show (verbose = TRUE
) or silent (verbose = FALSE
) any messages.
To further explore messages in joyn
, please refer to the "Messages" article.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.