join: Join two data sources together

Description Usage Arguments Details Value See Also Examples

Description

Join two data sources together

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
## S3 method for class 'RxFileData'
left_join(x, y, by = NULL, copy = FALSE,
  suffix = c(".x", ".y"), .outFile = tbl_xdf(x), .rxArgs, ...)

## S3 method for class 'RxFileData'
right_join(x, y, by = NULL, copy = FALSE,
  suffix = c(".x", ".y"), .outFile = tbl_xdf(x), .rxArgs, ...)

## S3 method for class 'RxFileData'
inner_join(x, y, by = NULL, copy = FALSE,
  suffix = c(".x", ".y"), .outFile = tbl_xdf(x), .rxArgs, ...)

## S3 method for class 'RxFileData'
full_join(x, y, by = NULL, copy = FALSE,
  suffix = c(".x", ".y"), .outFile = tbl_xdf(x), .rxArgs, ...)

## S3 method for class 'RxFileData'
semi_join(x, y, by = NULL, copy = FALSE,
  .outFile = tbl_xdf(x), .rxArgs, ...)

## S3 method for class 'RxFileData'
anti_join(x, y, by = NULL, copy = FALSE,
  .outFile = tbl_xdf(x), .rxArgs, ...)

## S3 method for class 'RxDataSource'
left_join(x, ...)

## S3 method for class 'RxDataSource'
right_join(x, ...)

## S3 method for class 'RxDataSource'
full_join(x, ...)

## S3 method for class 'RxDataSource'
inner_join(x, ...)

## S3 method for class 'RxDataSource'
anti_join(x, ...)

## S3 method for class 'RxDataSource'
semi_join(x, ...)

Arguments

x,

y Data sources to join.

by

Character vector of variables to join by. See join for details.

copy

If the data sources are not stored in the same filesystem, whether to copy y to x's location.

.outFile

Output format for the returned data. If not supplied, create an xdf tbl; if NULL, return a data frame; if a character string naming a file, save an Xdf file at that location.

.rxArgs

A list of RevoScaleR arguments. See rxArgs for details.

...

Not currently used.

Details

These functions merge two datasets together, using rxMerge.

For best performance, avoid merging on factor variables or on variables with mismatched types, especially in Spark. This is because rxMerge is picky about its inputs, and dplyrXdf may have to transform the data to ensure that the merge succeeds.

Currently, merging in Spark has a few limitations. Only Xdf (in HDFS) and Spark data sources (RxHiveData, RxOrcData and RxParquetData) can be merged, and only the "standard" join operations are supported: left_join, right_join, inner_join and full join. Moreover, Xdf files in HDFS can only be merged in the Spark compute context (not in the Hadoop or local compute contexts).

Value

An object representing the joined data. This depends on the .outFile argument: if missing, it will be an xdf tbl object; if NULL, a data frame; and if a filename, an Xdf data source referencing a file saved to that location.

See Also

join in package dplyr, rxMerge

Examples

1
2
3
4
5
6
7
bmembx <- as_xdf(band_members, overwrite=TRUE)
binstx <- as_xdf(band_instruments, overwrite=TRUE)

left_join(bmembx, binstx)
right_join(bmembx, binstx)
inner_join(bmembx, binstx)
full_join(bmembx, binstx)

RevolutionAnalytics/dplyrXdf documentation built on June 3, 2019, 9:08 p.m.