mutate: Add or modify variables
In RevolutionAnalytics/dplyrXdf: Tools for working with Microsoft R Server Xdf files and the dplyr package

Description Usage Arguments Details Value See Also Examples

Use mutate to add new variables and preserve existing ones; use transmute to keep only new and modified variables.

## S3 method for class 'RxFileData'
mutate(.data, ..., .outFile = tbl_xdf(.data), .rxArgs)

## S3 method for class 'grouped_tbl_xdf'
mutate(.data, ..., .outFile = tbl_xdf(.data),
  .rxArgs)

## S3 method for class 'RxFileData'
transmute(.data, ..., .outFile = tbl_xdf(.data), .rxArgs)

## S3 method for class 'grouped_tbl_xdf'
transmute(.data, ..., .outFile = tbl_xdf(.data),
  .rxArgs)

## S3 method for class 'RxDataSource'
mutate(.data, ...)

## S3 method for class 'RxDataSource'
transmute(.data, ...)

`.data`	A tbl for an Xdf data source; or a raw Xdf data source.
`...`	Variables to add or modify.
`.outFile`	Output format for the returned data. If not supplied, create an xdf tbl; if `NULL`, return a data frame; if a character string naming a file, save an Xdf file at that location.
`.rxArgs`	A list of RevoScaleR arguments. See `rxArgs` for details.

These functions call rxDataStep to do the variable transformations. For simple transformations, namely those that might be done using rxDataStep's transforms argument, you can simply pass these as named arguments in the main mutate or transmute call. More complex transformations can be passed in a .rxArgs argument, which should be a named list containing one or more of the transformFunc, transformVars, transformObjects, transformPackages and transformEnvir parameters.

Note that if you supply a transformFunc, its returned variables will override any transformations in the main call to mutate and transmute). In particular, the results of any such inline transformations will be lost unless you also include them in the output of the transformFunc. This mirrors the existing behaviour of the variable transformation functionality in RevoScaleR. It's not recommended to use both inline transformations and a transformFunc at the same time, as the results may be confusing.

To modify a grouped Xdf tbl, these functions split the data into one file per group, and call rxDataStep on each file. This ensures that the code remains scalable to large dataset sizes. Note however that this may be slow if you have a large number of groups. Consider whether you really need to group before transforming; or use do instead.

Grouped transforming on HDFS data is supported in the local compute context (on the edge node), but not in the Hadoop or Spark compute contexts.

An object representing the transformed data. This depends on the .outFile argument: if missing, it will be an xdf tbl object; if NULL, a data frame; and if a filename, an Xdf data source referencing a file saved to that location.

mutate and transmute in package dplyr, rxDataStep, rxTransform for variable transformations in RevoScaleR

mtx <- as_xdf(mtcars, overwrite=TRUE)
tbl <- mutate(mtx, mpg2=2 * mpg)
head(tbl)

tbl2 <- transmute(mtx, mpg2=2 * mpg)
head(tbl2)

# transform and select columns simultaneously with .rxArgs
tbl3 <- mutate(mtx, mpg2=2 * mpg, .rxArgs=list(varsToKeep=c("mpg", "cyl")))
head(tbl3)
nrow(tbl3)

# save to a persistent Xdf file
mutate(mtx, mpg2=2 * mpg, .outFile="mtcars_mutate.xdf")

# using a transformFunc
tbl4 <- mutate(mtx, .rxArgs=list(transformFunc=function(varlist) {
   varlist$mpg2 <- 2 * varlist$mpg
   varlist
}))
head(tbl4)

# a non-trivial example: using a transformFunc to calculate a moving average
## Not run: 
tbl <- mutate(xdf, .rxArgs=list(transformFunc=
    function(varList)
    {
        if(.rxIsTestChunk)
            return(varList)
        n <- .rxNumRows
        x <- c(.keepx, varList[[1]])
        ma <- rollmean(x, .k, fill=NA, align="right")
        n_ma <- length(ma)
        if(n_ma > n)
            ma <- ma[-(1:(n_ma - n))]
        .keepx <<- varList[[1]][(n - .k + 1):n]
        varList$x_ma <- ma
        varList
    },
    transformObjects=list(.keepx=numeric(), .k=5),  # k = window width
    transformVars="x",                              # x = variable to get moving average for
    transformPackages="zoo"))

## End(Not run)

RevolutionAnalytics/dplyrXdf documentation built on June 3, 2019, 9:08 p.m.

RevolutionAnalytics/dplyrXdf index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

RevolutionAnalytics/dplyrXdf
Tools for working with Microsoft R Server Xdf files and the dplyr package

mutate: Add or modify variables
In RevolutionAnalytics/dplyrXdf: Tools for working with Microsoft R Server Xdf files and the dplyr package

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to mutate in RevolutionAnalytics/dplyrXdf...

R Package Documentation

Browse R Packages

We want your feedback!

RevolutionAnalytics/dplyrXdf Tools for working with Microsoft R Server Xdf files and the dplyr package

mutate: Add or modify variables In RevolutionAnalytics/dplyrXdf: Tools for working with Microsoft R Server Xdf files and the dplyr package

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to mutate in RevolutionAnalytics/dplyrXdf...

R Package Documentation

Browse R Packages

We want your feedback!

RevolutionAnalytics/dplyrXdf
Tools for working with Microsoft R Server Xdf files and the dplyr package

mutate: Add or modify variables
In RevolutionAnalytics/dplyrXdf: Tools for working with Microsoft R Server Xdf files and the dplyr package