tbl_xdf: Generate tbl_xdf data source object

Description Usage Arguments Details Value Note on composite Xdf See Also Examples

Description

Generate tbl_xdf data source object

Usage

1
2
tbl_xdf(xdf = NULL, file = NULL, createCompositeSet = NULL,
  fileSystem = rxGetFileSystem(xdf), ...)

Arguments

xdf

A RxXdfData data source on which to base the tbl_xdf. If supplied, the parameters for the returned object, such as the filesystem and composite flag, will be based on this.

file

The filename to use for the tbl_xdf – this is the output filename to use when writing the data. By default, a random filename is generated.

createCompositeSet

Whether to create a composite Xdf file (see below).

fileSystem

The filesystem in which to save the Xdf file.

...

Further arguments passed to RxXdfData.

Details

dplyrXdf uses the tbl_xdf class as part of its file management tasks. A tbl_xdf object specifies the file to which a dplyrXdf verb will save its output, and from which the next verb in a pipeline will read its input.

Like an RxXdfData object, a tbl_xdf object is a pointer to a file on disk that stores the actual data. A tbl_xdf also includes information on whether the file was generated as part of a pipeline; if so, subsequent verbs will know to delete the file when they return. This way, only the final output of a pipeline is retained.

In general, you should never need to create a tbl_xdf object manually.

Since a tbl_xdf is an RxXdfData object, all RevoScaleR functions that can work with Xdf files should also work with tbl_xdf's. For example, you can pass the output from a dplyrXdf pipeline straight to a RevoScaleR or MicrosoftML modelling function like rxLinMod or rxNeuralNet. If you encounter code that only works with base RxXdfData objects (eg if it uses checks like if(class(obj) == "RxXdfData") {...}), you can strip off the tbl information with as_xdf(obj). See the examples below.

Value

An object of S4 class tbl_xdf, which inherits from RxXdfData.

Note on composite Xdf

There are actually two kinds of Xdf files: standard and composite. A composite Xdf file is a directory containing multiple data and metadata files, which the RevoScaleR functions treat as a single dataset. While Xdf files in the native filesystem can be in either format, those in HDFS must be composite.

See Also

RxXdfData, as_xdf

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
tbl_xdf()

# create an Xdf data source, and base a tbl_xdf object on it
xdf <- RxXdfData("file", createCompositeSet=TRUE)
tbl_xdf(xdf)

## Not run: 
# create a tbl_xdf in HDFS
tbl_xdf(fileSystem=RxHdfsFileSystem())

## End(Not run)

# example of code that requires a base RxXdfData object
my_model <- function(data, formula)
{
    if(class(data) != "RxXdfData")
        stop("must supply Xdf data source") 
    rxLinMod(formula, data=data)
}
mtx <- as_xdf(mtcars, overwrite=TRUE)
tbl <- select(mtx, mpg, wt, disp)
## Not run: 
# this will fail
my_model(tbl, mpg ~ wt + disp)

## End(Not run)
# use as_xdf() to convert back to RxXdfData
my_model(as_xdf(tbl), mpg ~ wt + disp)

RevolutionAnalytics/dplyrXdf documentation built on June 3, 2019, 9:08 p.m.