safe_join: Join two data frames safely

Description Usage Arguments Details Examples

View source: R/inexact_join.R

Description

This function is a wrapper for the standard dplyr join functions and the pmdplyr inexact_join functions.

Usage

1
safe_join(x, y, expect = NULL, join = NULL, ...)

Arguments

x, y

The left and right data sets to join.

expect

Either "1:m" (or "x"), "m:1" (or "y"), or "1:1" (or c("x","y") or "xy") - the match you expect to perform. You can specify this as the kind of match you expect to be performing (one-to-many, many-to-one, or one-to-one), or as the data set(s) you expect to be uniquely identified by the joining variables ("x", "y", or c("x", "y")/"xy"). Alternately, set to expect = "no m:m" if you don't care what join you're doing as long as it isn't many-to-many.

join

A join or inexact_join function to run if safe_join determines your join is safe. By default, simply returns TRUE instead of running the join.

...

Other arguments to be passed to the function specified in join. If performing an inexact_join, put the var and jvar arguments in as quoted variables.

Details

When performing a join, we generally expect that one or both of the joined data sets is uniquely identified by the set of joining variables.

If this is not true, the results of the join will often not be what you expect. Unfortunately, join does not warn you that you may have just done something strange.

This issue is especially likely to arise with panel data, where you may have multiple different data sets at different observation levels.

safe_join forces you to specify which of your data sets you think are uniquely identified by the joining variables. If you are wrong, it will return an error. If you are right, it will pass you on to your preferred join function, given in join. If join is not specified, it will just return TRUE.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# left is panel data and i does not uniquely identify observations
left <- data.frame(
  i = c(1, 1, 2, 2),
  t = c(1, 2, 1, 2),
  a = 1:4
)
# right is individual-level data uniquely identified by i
right <- data.frame(
  i = c(1, 2),
  b = 1:2
)

# I think that I can do a one-to-one merge on i
# Forgetting that left is identified by i and t together
# So, this produces an error
## Not run: 
safe_join(left, right, expect = "1:1", join = left_join)

## End(Not run)

# If I realize I'm doing a many-to-one merge, that is correct,
# so safe_join will perform it for us
safe_join(left, right, expect = "m:1", join = left_join)

pmdplyr documentation built on July 2, 2020, 4:08 a.m.