rbindlist: Makes one data.table from a list of many

Description Usage Arguments Details Value See Also Examples

Description

Same as do.call("rbind", l) on data.frames, but much faster. See DETAILS for more.

Usage

1
2
rbindlist(l, use.names=fill, fill=FALSE)
# rbind(..., use.names=TRUE, fill=FALSE)

Arguments

l

A list containing data.table, data.frame or list objects. At least one of the inputs should have column names set. ... is the same but you pass the objects by name separately.

use.names

If TRUE items will be bound by matching column names. By default FALSE for rbindlist (for backwards compatibility) and TRUE for rbind (consistency with base). Duplicate columns are bound in the order of occurrence, similar to base.

fill

If TRUE fills missing columns with NAs. By default FALSE. When TRUE, use.names has to be TRUE

Details

Each item of l can be a data.table, data.frame or list, including NULL (skipped) or an empty object (0 rows). rbindlist is most useful when there are a variable number of (potentially many) objects to stack, such as returned by lapply(fileNames, fread). rbind however is most useful to stack two or three objects which you know in advance. ... should contain at least one data.table for rbind(...) to call the fast method and return a data.table, whereas rbindlist(l) always returns a data.table even when stacking a plain list with a data.frame, for example.

In versions <= v1.9.2, each item for rbindlist should have the same number of columns as the first non empty item. rbind.data.table gained a fill argument to fill missing columns with NA in v1.9.2, which allowed for rbind(...) binding unequal number of columns.

In version > v1.9.2, these functionalities were extended to rbindlist (and written entirely in C for speed). rbindlist has use.names argument, which is set to FALSE by default for backwards compatibility (and is optimised for speed, since column names don't have to be checked here). It also contains fill argument as well and can bind unequal columns when set to TRUE.

With these changes, the only difference between rbind(...) and rbindlist(l) is their default argument use.names.

If column i of input items do not all have the same type; e.g, a data.table may be bound with a list or a column is factor while others are character types, they are coerced to the highest type (SEXPTYPE).

Note that any additional attributes that might exist on individual items of the input list would not be preserved in the result.

Value

An unkeyed data.table containing a concatenation of all the items passed in.

See Also

data.table

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
    # default case
    DT1 = data.table(A=1:3,B=letters[1:3])
    DT2 = data.table(A=4:5,B=letters[4:5])
    l = list(DT1,DT2)
    rbindlist(l)
    
    # bind correctly by names
    DT1 = data.table(A=1:3,B=letters[1:3])
    DT2 = data.table(B=letters[4:5],A=4:5)
    l = list(DT1,DT2)
    rbindlist(l, use.names=TRUE)

    # fill missing columns, and match by col names
    DT1 = data.table(A=1:3,B=letters[1:3])
    DT2 = data.table(B=letters[4:5],C=factor(1:2))
    l = list(DT1,DT2)
    rbindlist(l, use.names=TRUE, fill=TRUE)

Example output

   A B
1: 1 a
2: 2 b
3: 3 c
4: 4 d
5: 5 e
   A B
1: 1 a
2: 2 b
3: 3 c
4: 4 d
5: 5 e
    A B  C
1:  1 a NA
2:  2 b NA
3:  3 c NA
4: NA d  1
5: NA e  2

data.table documentation built on May 2, 2019, 4:57 p.m.