join_qc | R Documentation |
full_join_qc
, inner_join
, left_join
, right_join
,
anti_join
, and semi_join
return by default near identical
objects as their dplyr equivalents.
full_join_qc(x, y, by = NULL, copy = FALSE, .merge = NULL, .extra = NULL)
inner_join_qc(
x,
y,
by = NULL,
copy = FALSE,
suffix = c(".x", ".y"),
.merge = NULL,
.extra = NULL,
...
)
left_join_qc(
x,
y,
by = NULL,
copy = FALSE,
suffix = c(".x", ".y"),
.merge = NULL,
.extra = NULL,
...
)
right_join_qc(
x,
y,
by = NULL,
copy = FALSE,
suffix = c(".x", ".y"),
.merge = NULL,
.extra = NULL,
...
)
anti_join_qc(x, y, by = NULL, copy = FALSE, ...)
semi_join_qc(x, y, by = NULL, copy = FALSE, ...)
x, y |
A pair of data frames, data frame extensions (e.g. a tibble), or lazy data frames (e.g. from dbplyr or dtplyr). See Methods, below, for more details. |
by |
A join specification created with If To join on different variables between To join by multiple variables, use a
For simple equality joins, you can alternatively specify a character vector
of variable names to join by. For example, To perform a cross-join, generating all combinations of |
copy |
If |
.merge |
a character value used to name a new character variable, which
tracks the source of each row of the new, joined data. If |
.extra |
a character value used to name a new character variable, which
identifies any row of the new joined data that represents a combination of
the |
suffix |
If there are non-joined duplicate variables in |
... |
Other parameters passed onto methods. |
Each _qc version of the join functions is identical to its dplyr equivalent
except that it automatically prints the number of rows that were matched, the
number of rows that were not matched, and the number of additional rows
compared to the initial data frame(s) - for example when there is more than
one match on the by
identifier(s). There are also options to create
new variables identifying and classifying rows based on how/if they matched.
All joins except anti_join
and semi_join
are classified as
one to one, one to many, many to one, or many to many. These definitions
describe the extent to which there are duplicated rows of unique
combinations of the by
variable(s). In one to one merges, there is
only one unique row of identifiers in each data set. Extra rows are never
created in one to one joins. One to many and many to one joins occur when
one of the data sets has a duplicated id row. One to many implies the right
data has the duplicated id; many to one implies the left data has the
duplicated id. Additional rows may be created in these types of joins. Many
to many joins imply that both data sets have duplicated rows on the id
variable(s). Additional rows may be created in this types of join. The
join description is printed when performing any join except
anti_join
and semi_join
.
For full_join_qc
, left_join_qc
, and right_join_qc
,
there is an added option of creating a new variable that indicates whether
the row in the joined data was from the "left_only"
,
"right_only"
or "matched"
. This variable can be helpful when
diagnosing why the join did or did not match as desired. Whatever character
value that is supplied to .merge
becomes the name of this new
variable.
left_join_qc
, and right_join_qc
also have the option of
creating a new variable indicating whether the row in the joined data is
an additional row with the given combination of by
. For example, if
there were only 2 rows with an ID equal to "A" in the original left data
but 3 rows with this ID in the right data set, then the left joined data
will have more rows with this ID than the original left. .extra
is
a chracter value that when supplied becomes the name of this new variable
flagging a row that has additional rows than the original left
or right data frame.
Groups in the data frames are ignored for the purpose of joining, but the
result preserves the grouping of x
.
join
data_A <-
data.frame(
id = 1:10,
var_A = 11:20
)
data_B <-
data.frame(
id = c(5, 5, 5, 5, 6, 7, 7, 9, 10, 11),
id_A = c(1:10),
var_B = 21:30
)
# Full join with new .merge variable
full_join_qc(data_A, data_B, .merge = "merge_ab")
# Left join with new .extra variable
left_join_qc(data_A, data_B, .extra = "extra_ab")
# Right join with both new variables
right_join_qc(data_A, data_B, .merge = "merge_ab", .extra = "extra_ab")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.