safe_joins: Safe joins

Description Usage Arguments Details

Description

Wrappers around dplyr's joining function that allow to check a variety of things on the fly and either inform, warn or abort as a result.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
safe_left_join(
  x,
  y,
  by = NULL,
  copy = FALSE,
  suffix = c(".x", ".y"),
  na_matches = c("na", "never"),
  match_fun = NULL,
  check = "~blC",
  conflict = NULL
)

safe_right_join(
  x,
  y,
  by = NULL,
  copy = FALSE,
  suffix = c(".x", ".y"),
  na_matches = c("na", "never"),
  match_fun = NULL,
  check = "~blC",
  conflict = NULL
)

safe_inner_join(
  x,
  y,
  by = NULL,
  copy = FALSE,
  suffix = c(".x", ".y"),
  na_matches = c("na", "never"),
  match_fun = NULL,
  check = "~blC",
  conflict = NULL
)

safe_full_join(
  x,
  y,
  by = NULL,
  copy = FALSE,
  suffix = c(".x", ".y"),
  na_matches = c("na", "never"),
  match_fun = NULL,
  check = "~blC",
  conflict = NULL
)

safe_semi_join(
  x,
  y,
  by = NULL,
  copy = FALSE,
  na_matches = c("na", "never"),
  match_fun = NULL,
  check = "~blC",
  conflict = NULL
)

safe_anti_join(
  x,
  y,
  by = NULL,
  copy = FALSE,
  na_matches = c("na", "never"),
  match_fun = NULL,
  check = "~blC",
  conflict = NULL
)

safe_nest_join(
  x,
  y,
  by = NULL,
  copy = FALSE,
  keep = FALSE,
  name = NULL,
  match_fun = NULL,
  check = "~blC",
  conflict = NULL
)

Arguments

x, y

tbls to join

by

A character vector of variables to join by.

If NULL, the default, *_join() will perform a natural join, using all variables in common across x and y. A message lists the variables so that you can check they're correct; suppress the message by supplying by explicitly.

To join by different variables on x and y, use a named vector. For example, by = c("a" = "b") will match x$a to y$b.

To join by multiple variables, use a vector with length > 1. For example, by = c("a", "b") will match x$a to y$a and x$b to y$b. Use a named vector to match different variables in x and y. For example, by = c("a" = "b", "c" = "d") will match x$a to y$b and x$c to y$d.

To perform a cross-join, generating all combinations of x and y, use by = character().

copy

If x and y are not from the same data source, and copy is TRUE, then y will be copied into the same src as x. This allows you to join tables across srcs, but it is a potentially expensive operation so you must opt into it.

suffix

If there are non-joined duplicate variables in x and y, these suffixes will be added to the output to disambiguate them. Should be a character vector of length 2.

na_matches

Should NA and NaN values match one another?

The default, "na", treats two NA or NaN values as equal, like %in%, match(), merge().

Use "never" to always treat two NA or NaN values as different, like joins for database sources, similarly to merge(incomparables = FALSE).

match_fun

Vectorized function given two columns, returning TRUE or FALSE as to whether they are a match. Can be a list of functions one for each pair of columns specified in by (if a named list, it uses the names in x). If only one function is given it is used on all column pairs.

check

a string, see details

conflict

if NULL, in case of column conflict both columns are suffixed as in dplyr, if a function of two parameters or a formula, a function is applied on both columns. If the string "patch", matched values from y will overwrite existing values in x while the other values will be kept

keep

Should the join keys from both x and y be preserved in the output? Only applies to nest_join(), left_join(), right_join(), and full_join().

name

The name of the list column nesting joins create. If NULL the name of y is used.

Details

check is a combination of characters which will trigger different checks:

b

as in by, check that by was given explicitly. Default behavior in dplyr is to trigger a message

c

as in column conflict, check if, among non join columns, some column names are found in both x and y. Default behavior in dplyr's joining functions is to suffix them silently.

u

as in unique, check if no set of values of join columns is duplicated in x

v

the letter after u, check if no set of values of join columns is duplicated in y

m

as in match, check if all sets of values of join columns in x wil be matched in y

n

the letter after m, check if all sets of values of join columns in y wil be matched in x

e

as in expanded, check that all combinations of values of join columns are present in x

f

the letter after e, check that all combinations of values of join columns are present in y

l

as in levels, check that join columns are consistent in term of factor levels

t

as in type, check that joining columns have same class and type

An upper case letter will trigger abort, a lower case letter will trigger warn, a lower case letter prefixed with ~ will trigger a message. Other characters will be ignored.


moodymudskipper/safejoin documentation built on Sept. 2, 2020, 3:08 a.m.