drJoin: Join Data Sources by Key

Description Usage Arguments Value Author(s) See Also Examples

Description

Outer join of two or more distributed data object (DDO) sources by key

Usage

1
2
drJoin(..., output = NULL, overwrite = FALSE, postTransFn = NULL,
  params = NULL, packages = NULL, control = NULL)

Arguments

output

a "kvConnection" object indicating where the output data should reside (see localDiskConn, hdfsConn). If NULL (default), output will be an in-memory "ddo" object.

overwrite

logical; should existing output location be overwritten? (also can specify overwrite = "backup" to move the existing output to _bak)

postTransFn

an optional function to be applied to the each final key-value pair after joining

params

a named list of objects external to the input data that are needed in the distributed computing (most should be taken care of automatically such that this is rarely necessary to specify)

packages

a vector of R package names that contain functions used in fn (most should be taken care of automatically such that this is rarely necessary to specify)

control

parameters specifying how the backend should handle things (most-likely parameters to rhwatch in RHIPE) - see rhipeControl and localDiskControl

...

Input data sources: two or more named DDO objects that will be joined, separated by commas (see Examples for syntax). Specifically, each input object should inherit from the 'ddo' class. It is assumed that all input sources are of same type (all HDFS, all localDisk, all in-memory).

Value

a 'ddo' object stored in the output connection, where the values are named lists with names according to the names given to the input data objects, and values are the corresponding data. The 'ddo' object contains the union of all the keys contained in the input 'ddo' objects specified in ....

Author(s)

Ryan Hafen

See Also

drFilter, drLapply

Examples

1
2
3
4
5
bySpecies <- divide(iris, by = "Species")
# get independent lists of just SW and SL
sw <- drLapply(bySpecies, function(x) x$Sepal.Width)
sl <- drLapply(bySpecies, function(x) x$Sepal.Length)
drJoin(Sepal.Width = sw, Sepal.Length = sl, postTransFn = as.data.frame)

datadr documentation built on May 1, 2019, 8:06 p.m.