as_xdf: Detect and coerce to Xdf data source objects

Description Usage Arguments Details Value See Also Examples

Description

Functions to detect and coerce to Xdf data source objects.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
as_composite_xdf(...)

as_standard_xdf(...)

as_xdf(.data, ...)

## S3 method for class 'RxXdfData'
as_xdf(.data, file = NULL,
  composite = is_composite_xdf(.data), overwrite = FALSE, ...)

## S3 method for class 'RxDataSource'
as_xdf(.data, file = NULL,
  composite = in_hdfs(.data), overwrite = FALSE, ...)

## Default S3 method:
as_xdf(.data, file = NULL, composite = FALSE,
  overwrite = FALSE, ...)

is_xdf(x)

is_composite_xdf(x)

is_standard_xdf(x)

Arguments

...

Other (named) arguments to pass to rxDataStep.

.data

An R object that can be coerced to an Xdf data source. This includes another existing Xdf data source; see details below.

file

The name for the Xdf data file, optionally with path. If not supplied, this is taken from .data.

composite

Whether to create a composite Xdf file. Defaults to TRUE if .data is stored in HDFS, and FALSE otherwise.

overwrite

Whether to overwrite any existing file.

x

An R object.

Details

The as_xdf function takes the object given by .data and imports its data into an Xdf file, returning a data source pointing to that file. The file can be either a standard or a composite Xdf, as given by the composite argument. A composite Xdf is actually a directory containing data and metadata files; it can be manipulated by the RevoScaleR functions as if it were a single dataset.

The as_standard_xdf and as_composite_xdf functions are shorthand for as_xdf(*, composite=FALSE) and as_xdf(*, composite=TRUE) respectively; they always create either a standard or composite Xdf. You can use this to switch an existing Xdf data source from one type of Xdf to the other. Note that Xdf files in HDFS must always be composite.

Passing a tbl_xdf object to an as function will strip off the tbl information, returning a raw Xdf data source. This can be useful for resetting the beginning of a pipeline.

The file argument gives the name of the Xdf file to create. If not specified, this is taken from the input data source where possible (for Xdf and file data sources, including text). Otherwise, it is taken from the name of the input R object. If no directory is specified, the file is created in the current working directory (if in the native filesystem) or in the user's home directory (in HDFS).

You can use the as functions with any RevoScaleR data source, or otherwise with any R object that can be turned into a data frame. The resulting Xdf file will be created in the same filesystem as the input data source. If the input does not have a filesystem, for example if it is an in-database table or a data frame, the file is created in the native filesystem.

The is_xdf function returns TRUE if x is an Xdf data source object; ie, it inherits from the RxXdfData class. This includes both raw Xdf data sources and tbl_xdf objects as created by dplyrXdf. The is_composite_xdf function returns TRUE if x is a composite Xdf data source, and is_standard_xdf returns TRUE if x is an Xdf but not a composite Xdf.

Detecting whether an object is a composite Xdf can be tricky and is_composite_xdf goes through a few steps to do this. If x has a non-NULL createCompositeSet slot, then that value is returned. Otherwise, it checks whether the file slot refers to an existing directory, whose name does not have an extension (that is, "foo" qualifies as a valid filename for a composite Xdf, but not "foo.xdf"). This is necessary because of the semantics of rxDataStep.

To remove any ambiguity, it's recommended that you always explicitly specify the createCompositeSet argument when creating an Xdf data source object (objects created by dplyrXdf will always do this).

Value

For the as functions, an Xdf data source object pointing to the created file. For the is functions, a logical value.

See Also

as, is, inherits, persist, rxDataStep, rxImport

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# convert data frame to Xdf
mtx <- as_xdf(mtcars, overwrite=TRUE)
is_xdf(mtcars)
is_xdf(mtx)
is_composite_xdf(mtx)

# as_xdf() on an Xdf file without any args is a no-op
as_xdf(mtx)

## some common uses for as_xdf:

# convert a standard Xdf file to composite
mtc <- as_composite_xdf(mtx, overwrite=TRUE)
is_composite_xdf(mtc)

# convert a tbl_xdf to Xdf (could also use persist())
tbl <- mtx %>% mutate(mpg2=2 * mpg)
as_xdf(tbl, file="mtcars_mutate.xdf", overwrite=TRUE)

# import selected columns of a text file to Xdf
write.csv(mtcars, "mtcars.csv", row.names=FALSE)
mtt <- RxTextData("mtcars.csv")
mtx <- as_xdf(mtt, overwrite=TRUE, varsToKeep=c("mpg", "cyl"))

# import a database table to Xdf
## Not run: 
table_db <- RxSqlServerData(table="mytable", server="sqlserver", databaseName="dbname", ...)
table_xdf <- as_xdf(table_db)

## End(Not run)

RevolutionAnalytics/dplyrXdf documentation built on June 3, 2019, 9:08 p.m.