| dtjoin | R Documentation |
DT[i]-style interface
to data.tableWrite (and optionally run) data.table code for a join
using a generalisation of DT[i] syntax with extended arguments and
enhanced behaviour. Accepts any data.frame-like inputs (not only
data.tables), permits left, right, inner, and full joins, prevents
unwanted matches on NA and NaN by default, does not garble join
columns in non-equality joins, allows mult on both sides of the join,
creates an optional join indicator column, allows specifying which columns to
select from each input, and provides convenience options to control column
order and prefixing.
If run, the join returns a data.frame, data.table, tibble,
sf, or sf-tibble according to context. The generated
data.table code can be printed to the console instead of (or as well
as) being executed. This feature extends to mock joins, where no
inputs are provided, and template code is produced.
dtjoin is the workhorse function for fjoin_inner,
fjoin_left, fjoin_right, and
fjoin_full, which are wrappers providing a more conventional
interface for join operations. These functions are recommended over
dtjoin for most users and cases.
dtjoin(
.DT = NULL,
.i = NULL,
on,
match.na = FALSE,
mult = "all",
mult.DT = "all",
nomatch = NA,
nomatch.DT = NULL,
indicate = FALSE,
select = NULL,
select.DT = NULL,
select.i = NULL,
both = FALSE,
on.first = FALSE,
i.home = FALSE,
i.first = i.home,
prefix = if (i.home) "x." else "i.",
i.class = i.home,
do = !(is.null(.DT) && is.null(.i)),
show = !do,
verbose = FALSE,
...
)
.DT, .i |
|
on |
A character vector of join predicates, e.g. |
match.na |
If |
mult |
(as in |
mult.DT |
Like |
nomatch |
(as in |
nomatch.DT |
Like |
indicate |
Whether to add a column |
select, select.DT, select.i |
Character vectors of columns to be selected
from either input if present ( |
both |
Whether to include equality join columns from the "foreign"
table separately in the output, instead of combining them with those from
the "home" table. Default |
on.first |
Whether to place the join columns from both inputs first in
the join result. Default |
i.home |
Whether to treat |
i.first |
Whether to place |
prefix |
A prefix to attach to column names in the "foreign" table that
are the same as a column name in the "home" table. The default is
|
i.class |
Whether the |
do |
Whether to execute the join. Default is |
show |
Whether to print the code for the join to the console. Default is
the opposite of |
verbose |
(passed to |
... |
Further arguments (for internal use). |
Each input can be any object with class data.frame, or a plain
list of same-length vectors.
The output class depends on .DT by default (but .i with
i.class = TRUE) and is as follows:
a data.table if the input is a pure data.table
a tibble if it is a tibble (and a grouped tibble if it has class
grouped_df)
an sf if it is an sf with its active geometry selected
in the join
a plain data.frame in all other cases
The following attributes are carried through and refreshed: data.table
key, tibble groups, sf agr (and bbox etc. of all
individual sfc-class columns regardless of output class). See below
for specifics. Other classes and attributes are not carried through.
onon is a required argument. For a natural join (a join by equality on
all same-named column pairs), you must specify on = NA; you can't just
omit on as in other packages. This is to prevent a natural join being
specified by mistake, which may then go unnoticed.
select, select.DT, and select.iUsed on its own, select keeps the join columns plus the specified
non-join columns from both inputs if present.
If select.DT is provided (and similarly for select.i) then:
if select is also specified, non-join columns of .DT
named in either select or select.DT are included
if select is not specified, only non-join columns named in
select.DT are included from .DT. Thus e.g.
select.DT = "" excludes all of .DT's non-join columns.
Non-existent column names are ignored without warning.
When select is specified but select.DT and select.i are
not, the output consists of all join columns followed by the selected
non-join columns from either input in the order given in select.
In all other cases:
columns from .DT come before columns from .i by default
(but vice versa if i.first is TRUE)
within each group of columns, non-join columns are in the order
given by select.DT/select.i, or in their original data order
if no selection is provided
if on.first is TRUE, join columns from both inputs are
moved to the front of the overall output.
mult and mult.DTIf both of these arguments are not the default "all", mult is
applied first (typically by passing directly to [.data.table) and
mult.DT is applied subsequently to eliminate all but the first or last
occurrence of each row of .DT from the inner part of the join,
producing a 1:1 result. This order of operations can affect the identity of
the rows in the inner join.
The option of displaying the join code with show = TRUE or by passing
null inputs is aimed at data.table users wanting to use the package as
a cookbook of recipes for adaptation. If .DT and .i are both
NULL, template code is displayed based on join column names implied by
on, plus sample non-join column names. select arguments are
ignored in this case.
The code displayed is for the join operation after casting the inputs as
data.tables if necessary, and before casting the result as a tibble
and/or sf if applicable. Note that fjoin departs from the usual
j = list() idiom in order to avoid a deep copy of the output made by
as.data.table.list. (Likewise, internally it takes only shallow copies
of columns when casting inputs or outputs to different classes.)
groupsIf the relevant input is a grouped tibble (class grouped_df), the
output is grouped by the grouping columns that are selected in the result.
keysIf .i is a keyed data.table and the output is also a
data.table, it inherits .i's key provided
nomatch.DT is NULL (i.e. the non-matching rows of .DT
are not included in the result). This differs from a data.table
DT[i] join, in which the output inherits the key of DT
provided it remains sorted on those columns. If not all of the key columns
are selected in the result, the leading subset is used.
sfc-class columnsJoins between two sf objects are supported. The relation-to-geometry
attribute agr is inherited from the input supplying the active
geometry. All sfc-class columns in the output are refreshed after
joining (using sf::st_sfc() with recompute_bbox = TRUE); this
is true regardless of whether or not the inputs and output are sfs.
A data.frame, data.table, (grouped) tibble, sf,
or sf-tibble, or else NULL if do is FALSE. See
Details.
See the package-level documentation fjoin for related
functions.
# An illustration showing:
# - two calls to fjoin_left() (commented out), differing in the `order` argument
# - the resulting calls to dtjoin(), plus `show = TRUE`
# - the generated data.table code and output
# data frames
set.seed(1)
df_x <- data.frame(id_x = 1:3, col_x = paste0("x", 1:3), val = runif(3))
df_y <- data.frame(id_y = rep(4:2, each = 2), col_y = paste0("y", 1:6), val = runif(6))
# ---------------------------------------------------------------------------
# (1) fjoin_left(df_x, df_y, on = "id_x == id_y", mult.x = "first")
dtjoin(
df_y,
df_x,
on = "id_y == id_x",
mult = "first",
i.home = TRUE,
prefix = "R.",
show = TRUE
)
# (2) fjoin_left(df_x, df_y, on = "id_x == id_y", mult.x = "first", order = "right")
dtjoin(
df_x,
df_y,
on = "id_x == id_y",
mult.DT = "first",
nomatch = NULL,
nomatch.DT = NA,
prefix = "R.",
show = TRUE
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.