subset: Subset a data source by rows and/or columns

Description Usage Arguments Details Value See Also Examples

Description

Subset a data source by rows and/or columns

Usage

1
2
3
4
5
6
## S3 method for class 'RxFileData'
subset(.data, subset = NULL, select = NULL,
  .outFile = tbl_xdf(.data), .rxArgs)

## S3 method for class 'RxDataSource'
subset(.data, subset, select, ...)

Arguments

.data

A data source object, or tbl wrapping the same.

subset

Logical expression indicating rows to keep.

select

Columns to select. See link[dplyr]{select} for the ways in which you can keep or drop columns.

.outFile

Output format for the returned data. If not supplied, create an xdf tbl; if NULL, return a data frame; if a character string naming a file, save an Xdf file at that location.

.rxArgs

A list of RevoScaleR arguments. See rxArgs for details.

...

Other arguments passed to lower-level functions.

Details

This is a method for the subset generic from base R. It combines the effects of the filter and select verbs, allowing you to subset a RevoScaleR data source (typically an xdf file) by rows and columns simultaneously. The advantage of this for an Xdf file is that it significantly reduces the amount of I/O compared to doing the row and column subsetting in separate steps.

If the select argument is missing, subset returns all the columns in the data; this is different to the select verb, which returns no columns if no arguments are provided.

Value

An object representing the subsetted data. This depends on the .outFile argument: if missing, it will be an xdf tbl object; if NULL, a data frame; and if a filename, an Xdf data source referencing a file saved to that location.

See Also

subset in base R, filter, select, rxDataStep

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
mtx <- as_xdf(mtcars, overwrite=TRUE)
tbl <- subset(mtx, mpg > 20, c(mpg, cyl))
dim(tbl)

# transform and filter simultaneously with .rxArgs
tbl2 <- subset(mtx, mpg > 20, c(mpg, cyl), .rxArgs=list(transforms=list(mpg2=2 * mpg)))
dim(tbl2)
names(tbl2)

# save to a persistent Xdf file
subset(mtx, mpg > 20, c(mpg, cyl), .outFile="mtcars_subset.xdf")

RevolutionAnalytics/dplyrXdf documentation built on June 3, 2019, 9:08 p.m.