merge: Merge Two Data Tables
In data.table: Extension of data.frame

Description Usage Arguments Details Value See Also Examples

Relatively quick merge of two data.tables based on common key columns (by default).

This merge method for data.table is meant to act very similarly to the merge method for data.frame, with the major exception being that the default columns used to merge two data.table inputs are the shared key columns rather than the shared columns with the same names.

For a more data.table-centric (and faster) way of merging two data.tables, see [.data.table; e.g., x[y, ...]. In recent versions, however, merge() is much closer to the speed of x[y, ...]. See FAQ 1.12 for a detailed comparison of merge and x[y, ...].

Note that merge is a generic function in base R. It dispatches to either the merge.data.frame method or merge.data.table method depending on the class of its first argument. Typing ?merge at the prompt should present a choice of two links: the help pages for each of these merge methods. You don't need to use the full name of the method although you may if you wish; i.e., merge(DT1,DT2) is idiomatic R but you can bypass method dispatch by going direct if you wish: merge.data.table(DT1,DT2).

## S3 method for class 'data.table'
merge(x, y, by = NULL, all = FALSE, all.x = all, all.y = all, suffixes = c(".x", ".y"), 
allow.cartesian=getOption("datatable.allow.cartesian"),  # default FALSE
...)

`x, y`	`data table`s. `y` is coerced to a `data.table` if it isn't one already.
`by`	A vector of shared column names in `x` and `y` to merge on. This defaults to the shared key columns between the two tables. If `y` has no key columns, this defaults to the key of `x`.
`all`	logical; `all = TRUE` is shorthand to save setting both `all.x = TRUE` and `all.y = TRUE`.
`all.x`	logical; if `TRUE`, then extra rows will be added to the output, one for each row in `x` that has no matching row in `y`. These rows will have 'NA's in those columns that are usually filled with values from `y`. The default is `FALSE`, so that only rows with data from both `x` and `y` are included in the output.
`all.y`	logical; analogous to `all.x` above.
`suffixes`	A `character(2)` specifying the suffixes to be used for making non-`by` column names unique. The suffix behavior works in a similar fashion as the `merge.data.frame` method does.
`allow.cartesian`	See `allow.cartesian` in `[.data.table`.
`...`	Not used at this time.

Note that if the specified columns in by is not the key (or head of the key) of x or y, then a copy is first rekeyed prior to performing the merge. This might make this function perform slower than you are expecting. When secondary keys are implemented in future we expect performance in this case to improve.

A new data.table based on the merged data tables, sorted by the columns set (or inferred for) the by argument.

data.table, [.data.table, merge.data.frame

    (dt1 <- data.table(A = letters[1:10], X = 1:10, key = "A"))
    (dt2 <- data.table(A = letters[5:14], Y = 1:10, key = "A"))
    merge(dt1, dt2)
    merge(dt1, dt2, all = TRUE)

    (dt1 <- data.table(A = letters[rep(1:3, 2)], X = 1:6, key = "A"))
    (dt2 <- data.table(A = letters[rep(2:4, 2)], Y = 6:1, key = "A"))
    merge(dt1, dt2, allow.cartesian=TRUE)

    (dt1 <- data.table(A = c(rep(1L, 5), 2L), B = letters[rep(1:3, 2)], X = 1:6, key = "A,B"))
    (dt2 <- data.table(A = c(rep(1L, 5), 2L), B = letters[rep(2:4, 2)], Y = 6:1, key = "A,B"))
    merge(dt1, dt2)
    merge(dt1, dt2, by="B", allow.cartesian=TRUE)

    # test it more:
    d1 <- data.table(a=rep(1:2,each=3), b=1:6, key="a,b")
    d2 <- data.table(a=0:1, bb=10:11, key="a")
    d3 <- data.table(a=0:1, key="a")
    d4 <- data.table(a=0:1, b=0:1, key="a,b")

    merge(d1, d2)
    merge(d2, d1)
    merge(d1, d2, all=TRUE)
    merge(d2, d1, all=TRUE)

    merge(d3, d1)
    merge(d1, d3)
    merge(d1, d3, all=TRUE)
    merge(d3, d1, all=TRUE)

    merge(d1, d4)
    merge(d1, d4, by="a", suffixes=c(".d1", ".d4"))
    merge(d4, d1)
    merge(d1, d4, all=TRUE)
    merge(d4, d1, all=TRUE)