factorise: Convert columns in an Xdf file to factor

Description Usage Arguments Details Value See Also Examples

Description

Convert columns in an Xdf file to factor

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
factorise(.data, ...)

factorize(.data, ...)

## S3 method for class 'RxXdfData'
factorise(.data, ..., .outFile = tbl_xdf(.data), .rxArgs)

## S3 method for class 'RxFileData'
factorise(.data, ..., .outFile = tbl_xdf(.data), .rxArgs)

all_character(vars = .varTypes)

all_integer(vars = .varTypes)

all_numeric(vars = .varTypes)

## S3 method for class 'data.frame'
factorise(.data, ...)

## S3 method for class 'RxDataSource'
factorise(.data, ...)

Arguments

.data

A data source.

...

Variables to convert to factors.

.outFile

Output format for the returned data. If not supplied, create an xdf tbl; if NULL, return a data frame; if a character string naming a file, save an Xdf file at that location.

.rxArgs

A list of RevoScaleR arguments. See rxArgs for details.

Details

The selector functions listed in select also work with factorise. In addition, you can use the following:

If no variables are specified, all character variables will be converted to factors.

You can specify the levels for a variable by specifying them as the value of the argument. For example, factorise(*, x = c("a","b","c")) will turn the variable x into a factor with three levels a, b and c. Any values that don't match the set of levels will be turned into NAs. In particular, this means you should include the existing levels for variables that are already factors.

For performance reasons, factors created by factorise are not sorted; instead, the ordering of their levels will be determined by the order in which they are encountered in the data.

The method for RxXdfData objects is a shell around rxFactors, which is the standard RevoScaleR function for factor manipulation. For RxFileData objects, the method calls rxImport with an appropriately constructed colInfo argument.

The data frame method simply calls factor to convert the specified columns into factors.

Value

An object representing the returned data. This depends on the .outFile argument: if missing, it will be an xdf tbl object; if NULL, a data frame; and if a filename, an Xdf data source referencing a file saved to that location.

See Also

rxFactors, rxImport, factor

chol, qr, svd for the other meaning of factorise

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
mtx <- as_xdf(mtcars, overwrite=TRUE)
tbl1 <- factorise(mtx, am, vs)
tbl_types(tbl1)

tbl2 <- factorise(mtx, all_numeric())
tbl_types(tbl2)

# selector functions used by select(), rename() also work
tbl3 <- factorise(mtx, starts_with("m"))
tbl_types(tbl3)

# save to a persistent Xdf file
factorise(mtx, am, vs, .outFile="mtcars_factor.xdf")

# factorise() also works with data frames
tbl4 <- factorise(mtcars, cyl)
tbl_types(tbl4)

RevolutionAnalytics/dplyrXdf documentation built on June 3, 2019, 9:08 p.m.