join | R Documentation |
Join two data frame like objects x
and y
on
columns. Inspired by polars and by default uses a vectorized hash join algorithm (workhorse function fmatch
).
join(x, y,
on = NULL,
how = "left",
suffix = NULL,
validate = "m:m",
multiple = FALSE,
sort = FALSE,
keep.col.order = TRUE,
drop.dup.cols = FALSE,
verbose = .op[["verbose"]],
column = NULL,
attr = NULL,
...
)
x |
a data frame-like object. The result will inherit the attributes of this object. |
y |
a data frame-like object to join with |
on |
character. vector of columns to join on. |
how |
character. Join type: |
suffix |
character(1 or 2). Suffix to add to duplicate column names. |
validate |
character. (Optional) check if join is of specified type. One of |
multiple |
logical. Handling of rows in |
sort |
logical. |
keep.col.order |
logical. Keep order of columns in |
drop.dup.cols |
instead of renaming duplicate columns in |
verbose |
integer. Prints information about the join. One of 0 (off), 1 (default, see Details) or 2 (additionally prints the classes of the |
column |
(optional) name for an extra column to generate in the output indicating which dataset a record came from. |
attr |
(optional) name for attribute providing information about the join performed (including the output of |
... |
further arguments to |
If verbose > 0
, join
prints a compact summary of the join operation using cat
. If the names of x
and y
can be extracted (if as.character(substitute(x))
yields a single string) they will be displayed (otherwise 'x' and 'y' are used) followed by the respective join keys in brackets. This is followed by a summary of the records used from each table. If multiple = FALSE
, only the first matches from y
are used and counted here (or the first matches of x
if how = "right"
). Note that if how = "full"
any further matches are simply appended to the results table, thus it may make more sense to use multiple = TRUE
with the full join when suspecting multiple matches.
If multiple = TRUE
, join
performs a full cartesian product matching every key in x
to every matching key in y
. This can considerably increase the size of the resulting table. No memory checks are performed (your system will simply run out of memory; usually this should not terminate R).
In both cases, join
will also determine the average order of the join as the number of records used from each table divided by the number of unique matches and display it between the two tables at up to 2 digits. For example "<4:1.5>"
means that on average 4 records from x
match 1.5 records from y
, implying on average 4*1.5 = 6
records generated per unique match. If multiple = FALSE
"1st"
will be displayed for the using table (y
unless how = "right"
), indicating that there could be multiple matches but only the first is retained. Note that an order of '1' on either table must not imply that the key is unique as this value is generated from round(v, 2)
. To be sure about a keys uniqueness employ the validate
argument.
A data frame-like object of the same type and attributes as x
. "row.names"
of x
are only preserved in left-join operations.
fmatch
, Data Frame Manipulation, Fast Grouping and Ordering, Collapse Overview
df1 <- data.frame(
id1 = c(1, 1, 2, 3),
id2 = c("a", "b", "b", "c"),
name = c("John", "Jane", "Bob", "Carl"),
age = c(35, 28, 42, 50)
)
df2 <- data.frame(
id1 = c(1, 2, 3, 3),
id2 = c("a", "b", "c", "e"),
salary = c(60000, 55000, 70000, 80000),
dept = c("IT", "Marketing", "Sales", "IT")
)
# Different types of joins
for(i in c("l","i","r","f","s","a"))
join(df1, df2, how = i) |> print()
# With multiple matches
for(i in c("l","i","r","f","s","a"))
join(df1, df2, on = "id2", how = i, multiple = TRUE) |> print()
# Adding join column: useful esp. for full join
join(df1, df2, how = "f", column = TRUE)
# Custom column + rearranging
join(df1, df2, how = "f", column = list("join", c("x", "y", "x_y")), keep = FALSE)
# Attaching match attribute
str(join(df1, df2, attr = TRUE))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.