joinbyv: Batch join multiple tables

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/joinbyv.R

Description

Perform batch join of multiple tables to one master table.

Usage

1
joinbyv(master, join, by, col.subset, row.subset, nomatch, allow.cartesian)

Arguments

master

data.table, optionally also a single data.table nested in the list.

join

list of data.tables which to join to master data.table.

by

list of character vectors. Default: lapply(join, key). Required when not all of join data.tables has key.

col.subset

list of character vectors. Default: lapply(join, names).

row.subset

list of expressions to be passed to corresponding join data.table i argument. Default: as.list(rep(TRUE,length(join))). To subset result master data.table, use row.subset together with corresponding nomatch argument equal to 0 (inner join). By default when providing row.subset list element the corresponding nomatch argument will be changed to 0 to perform inner join, otherwise it will be getOption("datatable.nomatch"). If you really want to do outer join to already filtered join data.table you need to override corresponding nomatch argument for NA. Cross table expressions and not supported inside joinbyv.

nomatch

list of integer scalars NA or 0 elements corresponding join data.tables. Default: lapply(row.subset, function(x) if(is.language(x)) 0 else getOption("datatable.nomatch")). Indicates outer join for NA and inner join for 0. When data.table extends allowed argument in the getOption("datatable.nomatch") then it should accept not only integer scalar but any value supported by data.table as nomatch argument.

allow.cartesian

list of logical scalar elements corresponding to join data.tables to define which of the joins are allowed to do cartesian product. Default: as.list(rep(getOption("datatable.allow.cartesian"),length(join))).

Details

Any NULL inside the lists provided to by, col.subset, row.subset, nomatch, allow.cartesian will be replaced by the default value for particular NULL element. Therefore it is possible to pass partially filled lists, length of each must match to length of join, example col.subset=list(NULL,c("join2_col1","join2_col2"),NULL,c("join4_col1")). Function do not allow cross tables row.subset filtering expressions. User should apply such filter after joinbyv, example: joinbyv(master, join)[join1_colA > join2_colA * 2]. Arguments nomatch, allow.cartesian by default will be setup according to data.table options. An exception case described in row.subset argument will by default override the nomatch argument to value 0 to perform inner join. Possibly the performance might be improved after implementing data.table FR #691 and #692.

Value

data.table, denormalized master table joined to defined join data.tables. Column order according to col.subset, no key.

Author(s)

Jan Gorecki

See Also

build_hierarchy, dw.populate

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
suppressPackageStartupMessages(library(dwtools))

X <- dw.populate() # scenario="star"
lapply(X, head)

DT <- joinbyv(master = X$SALES,
              join = list(time = X$TIME, product = X$PRODUCT, currency = X$CURRENCY),
              by = list("time_code","prod_code","curr_code"))
str(DT)

# joinbyv including timing
DT <- timing(
  joinbyv(master = X$SALES,
          join = list(time = X$TIME, product = X$PRODUCT, currency = X$CURRENCY),
          by = list("time_code","prod_code","curr_code"))
)
get.timing()

jangorecki/dwtools documentation built on May 18, 2019, 12:24 p.m.