dt: Wrappers of 'data.table' functions
In j-martineau/uj: Utilities by JAM

dt	R Documentation

Wrappers of `data.table` functions

Description

Fast merge of two data.tables. The data.table method behaves similarly to data.frame except that row order is specified, and by default the columns to merge on are chosen:

at first based on the shared key columns, and if there are none,
then based on key columns of the first argument x, and if there are none,
then based on the common columns between the two data.tables.

Use the by, by.x and by.y arguments explicitly to override this default.

Usage

as_dt(x, keep.rownames = FALSE, ...)

is_dt(x)

ie_dt(x)

dt_sub(x, row, col)

dt_cols(x, col)

dt_merge(
  x,
  y,
  by = NULL,
  by.x = NULL,
  by.y = NULL,
  all = FALSE,
  all.x = all,
  all.y = all,
  sort = TRUE,
  suffixes = c(".x", ".y"),
  no.dups = TRUE,
  allow.cartesian = base::getOption("datatable.allow.cartesian"),
  say = TRUE
)

dt_rows(x, r)

dt_wide(
  data,
  formula,
  fun.aggregate = NULL,
  sep = "_",
  ...,
  margins = NULL,
  subset = NULL,
  fill = NULL,
  drop = TRUE,
  value.var = data.table:::guess(data),
  verbose = base::getOption("datatable.verbose"),
  say = TRUE
)

Arguments

`x`	Any object for `as_dt`, `is_dt`, and `ie_dt`. A `data.table` for all others.
`...`	Not used at this time.
`row`, `col`	Complete indexer vecs or complete character vecs identifying rows and columns of `x`, respectively.
`y`	A data.table to be merged with `x`.
`by`	A vector of shared column names in `x` and `y` to merge on. This defaults to the shared key columns between the two tables. If `y` has no key columns, this defaults to the key of `x`.
`by.x`, `by.y`	Vectors of column names in `x` and `y` to merge on.
`all`	logical; `all = TRUE` is shorthand to save setting both `all.x = TRUE` and `all.y = TRUE`.
`all.x`	logical; if `TRUE`, rows from `x` which have no matching row in `y` are included. These rows will have 'NA's in the columns that are usually filled with values from `y`. The default is `FALSE` so that only rows with data from both `x` and `y` are included in the output.
`all.y`	logical; analogous to `all.x` above.
`sort`	logical. If `TRUE` (default), the rows of the merged `data.table` are sorted by setting the key to the `by / by.x` columns. If `FALSE`, unlike base R's `merge` for which row order is unspecified, the row order in `x` is retained (including retaining the position of missing entries when `all.x=TRUE`), followed by `y` rows that don't match `x` (when `all.y=TRUE`) retaining the order those appear in `y`.
`suffixes`	A `character(2)` specifying the suffixes to be used for making non-`by` column names unique. The suffix behaviour works in a similar fashion as the `merge.data.frame` method does.
`no.dups`	logical indicating that `suffixes` are also appended to non-`by.y` column names in `y` when they have the same column name as any `by.x`.
`allow.cartesian`	See `allow.cartesian` in `[.data.table`.
`say`	Logical scalar indicating whether to update user on progress.
`data`	A `data.table`.
`formula`	A formula of the form LHS ~ RHS to cast, see Details.
`fun.aggregate`	Should the data be aggregated before casting? If the formula doesn't identify a single observation for each cell, then aggregation defaults to `length` with a warning of class 'dt_missing_fun_aggregate_warning'. To use multiple aggregation functions, pass a `list`; see Examples.
`sep`	Character vector of length 1, indicating the separating character in variable names generated during casting. Default is `_` for backwards compatibility.
`margins`	Not implemented yet. Should take variable names to compute margins on. A value of `TRUE` would compute all margins.
`subset`	Specified if casting should be done on a subset of the data. Ex: `subset = .(col1 <= 5)` or `subset = .(variable != "January")`.
`fill`	Value with which to fill missing cells. If `fill=NULL` and missing cells are present, then `fun.aggregate` is used on a 0-length vector to obtain a fill value.
`drop`	`FALSE` will cast by including all missing combinations. `c(FALSE, TRUE)` will only include all missing combinations of formula `LHS`; `c(TRUE, FALSE)` will only include all missing combinations of formula RHS. See Examples.
`value.var`	Name of the column whose values will be filled to cast. Function `guess()` tries to, well, guess this column automatically, if none is provided. Cast multiple `value.var` columns simultaneously by passing their names as a `character` vector. See Examples.
`verbose`	Not used yet. May be dropped in the future or used to provide informative messages through the console.

Details

`dt_merge`	Thinly wraps `merge`.
`dt_wide`	Thinly wraps `dcast`.
`dt_rows`	Selects rows.
`dt_cols`	Selects columns without `x[ , ..var]`.
`dt_sub`	Selects a subtable without `x[row.var, ..col.var]`.
`as_dt`	Thinly wraps `as.data.table`.
`is_dt`	Thinly wraps `is.data.table`.
`ie_dt`	Convert to `data.table`, if needed.

Value

A data.table.

Examples

(dt1 <- data.table(A = letters[1:10], X = 1:10, key = "A"))
(dt2 <- data.table(A = letters[5:14], Y = 1:10, key = "A"))
merge(dt1, dt2)
merge(dt1, dt2, all = TRUE)

(dt1 <- data.table(A = letters[rep(1:3, 2)], X = 1:6, key = "A"))
(dt2 <- data.table(A = letters[rep(2:4, 2)], Y = 6:1, key = "A"))
merge(dt1, dt2, allow.cartesian=TRUE)

(dt1 <- data.table(A = c(rep(1L, 5), 2L), B = letters[rep(1:3, 2)], X = 1:6, key = c("A", "B")))
(dt2 <- data.table(A = c(rep(1L, 5), 2L), B = letters[rep(2:4, 2)], Y = 6:1, key = c("A", "B")))
merge(dt1, dt2)
merge(dt1, dt2, by="B", allow.cartesian=TRUE)

# test it more:
d1 <- data.table(a=rep(1:2,each=3), b=1:6, key=c("a", "b"))
d2 <- data.table(a=0:1, bb=10:11, key="a")
d3 <- data.table(a=0:1, key="a")
d4 <- data.table(a=0:1, b=0:1, key=c("a", "b"))

merge(d1, d2)
merge(d2, d1)
merge(d1, d2, all=TRUE)
merge(d2, d1, all=TRUE)

merge(d3, d1)
merge(d1, d3)
merge(d1, d3, all=TRUE)
merge(d3, d1, all=TRUE)

merge(d1, d4)
merge(d1, d4, by="a", suffixes=c(".d1", ".d4"))
merge(d4, d1)
merge(d1, d4, all=TRUE)
merge(d4, d1, all=TRUE)

# setkey is automatic by default
set.seed(1L)
d1 <- data.table(a=sample(rep(1:3,each=2)), z=1:6)
d2 <- data.table(a=2:0, z=10:12)
merge(d1, d2, by="a")
merge(d1, d2, by="a", all=TRUE)

# using by.x and by.y
setnames(d2, "a", "b")
merge(d1, d2, by.x="a", by.y="b")
merge(d1, d2, by.x="a", by.y="b", all=TRUE)
merge(d2, d1, by.x="b", by.y="a")

# using incomparables values
d1 <- data.table(a=c(1,2,NA,NA,3,1), z=1:6)
d2 <- data.table(a=c(1,2,NA), z=10:12)
merge(d1, d2, by="a")
merge(d1, d2, by="a", incomparables=NA)

j-martineau/uj documentation built on Sept. 14, 2024, 4:40 a.m.