mutate: Add or modify variables

Description Usage Arguments Details Value See Also Examples

Description

Use mutate to add new variables and preserve existing ones; use transmute to keep only new and modified variables.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
## S3 method for class 'RxFileData'
mutate(.data, ..., .outFile = tbl_xdf(.data), .rxArgs)

## S3 method for class 'grouped_tbl_xdf'
mutate(.data, ..., .outFile = tbl_xdf(.data),
  .rxArgs)

## S3 method for class 'RxFileData'
transmute(.data, ..., .outFile = tbl_xdf(.data), .rxArgs)

## S3 method for class 'grouped_tbl_xdf'
transmute(.data, ..., .outFile = tbl_xdf(.data),
  .rxArgs)

## S3 method for class 'RxDataSource'
mutate(.data, ...)

## S3 method for class 'RxDataSource'
transmute(.data, ...)

Arguments

.data

A tbl for an Xdf data source; or a raw Xdf data source.

...

Variables to add or modify.

.outFile

Output format for the returned data. If not supplied, create an xdf tbl; if NULL, return a data frame; if a character string naming a file, save an Xdf file at that location.

.rxArgs

A list of RevoScaleR arguments. See rxArgs for details.

Details

These functions call rxDataStep to do the variable transformations. For simple transformations, namely those that might be done using rxDataStep's transforms argument, you can simply pass these as named arguments in the main mutate or transmute call. More complex transformations can be passed in a .rxArgs argument, which should be a named list containing one or more of the transformFunc, transformVars, transformObjects, transformPackages and transformEnvir parameters.

Note that if you supply a transformFunc, its returned variables will override any transformations in the main call to mutate and transmute). In particular, the results of any such inline transformations will be lost unless you also include them in the output of the transformFunc. This mirrors the existing behaviour of the variable transformation functionality in RevoScaleR. It's not recommended to use both inline transformations and a transformFunc at the same time, as the results may be confusing.

To modify a grouped Xdf tbl, these functions split the data into one file per group, and call rxDataStep on each file. This ensures that the code remains scalable to large dataset sizes. Note however that this may be slow if you have a large number of groups. Consider whether you really need to group before transforming; or use do instead.

Grouped transforming on HDFS data is supported in the local compute context (on the edge node), but not in the Hadoop or Spark compute contexts.

Value

An object representing the transformed data. This depends on the .outFile argument: if missing, it will be an xdf tbl object; if NULL, a data frame; and if a filename, an Xdf data source referencing a file saved to that location.

See Also

mutate and transmute in package dplyr, rxDataStep, rxTransform for variable transformations in RevoScaleR

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
mtx <- as_xdf(mtcars, overwrite=TRUE)
tbl <- mutate(mtx, mpg2=2 * mpg)
head(tbl)

tbl2 <- transmute(mtx, mpg2=2 * mpg)
head(tbl2)

# transform and select columns simultaneously with .rxArgs
tbl3 <- mutate(mtx, mpg2=2 * mpg, .rxArgs=list(varsToKeep=c("mpg", "cyl")))
head(tbl3)
nrow(tbl3)

# save to a persistent Xdf file
mutate(mtx, mpg2=2 * mpg, .outFile="mtcars_mutate.xdf")

# using a transformFunc
tbl4 <- mutate(mtx, .rxArgs=list(transformFunc=function(varlist) {
   varlist$mpg2 <- 2 * varlist$mpg
   varlist
}))
head(tbl4)

# a non-trivial example: using a transformFunc to calculate a moving average
## Not run: 
tbl <- mutate(xdf, .rxArgs=list(transformFunc=
    function(varList)
    {
        if(.rxIsTestChunk)
            return(varList)
        n <- .rxNumRows
        x <- c(.keepx, varList[[1]])
        ma <- rollmean(x, .k, fill=NA, align="right")
        n_ma <- length(ma)
        if(n_ma > n)
            ma <- ma[-(1:(n_ma - n))]
        .keepx <<- varList[[1]][(n - .k + 1):n]
        varList$x_ma <- ma
        varList
    },
    transformObjects=list(.keepx=numeric(), .k=5),  # k = window width
    transformVars="x",                              # x = variable to get moving average for
    transformPackages="zoo"))

## End(Not run)

RevolutionAnalytics/dplyrXdf documentation built on June 3, 2019, 9:08 p.m.