Description Usage Arguments Details Value See Also Examples
Use mutate
to add new variables and preserve existing ones; use transmute
to keep only new and modified variables.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | ## S3 method for class 'RxFileData'
mutate(.data, ..., .outFile = tbl_xdf(.data), .rxArgs)
## S3 method for class 'grouped_tbl_xdf'
mutate(.data, ..., .outFile = tbl_xdf(.data),
.rxArgs)
## S3 method for class 'RxFileData'
transmute(.data, ..., .outFile = tbl_xdf(.data), .rxArgs)
## S3 method for class 'grouped_tbl_xdf'
transmute(.data, ..., .outFile = tbl_xdf(.data),
.rxArgs)
## S3 method for class 'RxDataSource'
mutate(.data, ...)
## S3 method for class 'RxDataSource'
transmute(.data, ...)
|
.data |
A tbl for an Xdf data source; or a raw Xdf data source. |
... |
Variables to add or modify. |
.outFile |
Output format for the returned data. If not supplied, create an xdf tbl; if |
.rxArgs |
A list of RevoScaleR arguments. See |
These functions call rxDataStep
to do the variable transformations. For simple transformations, namely those that might be done using rxDataStep
's transforms
argument, you can simply pass these as named arguments in the main mutate
or transmute
call. More complex transformations can be passed in a .rxArgs
argument, which should be a named list containing one or more of the transformFunc
, transformVars
, transformObjects
, transformPackages
and transformEnvir
parameters.
Note that if you supply a transformFunc
, its returned variables will override any transformations in the main call to mutate
and transmute
). In particular, the results of any such inline transformations will be lost unless you also include them in the output of the transformFunc
. This mirrors the existing behaviour of the variable transformation functionality in RevoScaleR. It's not recommended to use both inline transformations and a transformFunc
at the same time, as the results may be confusing.
To modify a grouped Xdf tbl, these functions split the data into one file per group, and call rxDataStep
on each file. This ensures that the code remains scalable to large dataset sizes. Note however that this may be slow if you have a large number of groups. Consider whether you really need to group before transforming; or use do
instead.
Grouped transforming on HDFS data is supported in the local compute context (on the edge node), but not in the Hadoop or Spark compute contexts.
An object representing the transformed data. This depends on the .outFile
argument: if missing, it will be an xdf tbl object; if NULL
, a data frame; and if a filename, an Xdf data source referencing a file saved to that location.
mutate
and transmute
in package dplyr, rxDataStep
, rxTransform
for variable transformations in RevoScaleR
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | mtx <- as_xdf(mtcars, overwrite=TRUE)
tbl <- mutate(mtx, mpg2=2 * mpg)
head(tbl)
tbl2 <- transmute(mtx, mpg2=2 * mpg)
head(tbl2)
# transform and select columns simultaneously with .rxArgs
tbl3 <- mutate(mtx, mpg2=2 * mpg, .rxArgs=list(varsToKeep=c("mpg", "cyl")))
head(tbl3)
nrow(tbl3)
# save to a persistent Xdf file
mutate(mtx, mpg2=2 * mpg, .outFile="mtcars_mutate.xdf")
# using a transformFunc
tbl4 <- mutate(mtx, .rxArgs=list(transformFunc=function(varlist) {
varlist$mpg2 <- 2 * varlist$mpg
varlist
}))
head(tbl4)
# a non-trivial example: using a transformFunc to calculate a moving average
## Not run:
tbl <- mutate(xdf, .rxArgs=list(transformFunc=
function(varList)
{
if(.rxIsTestChunk)
return(varList)
n <- .rxNumRows
x <- c(.keepx, varList[[1]])
ma <- rollmean(x, .k, fill=NA, align="right")
n_ma <- length(ma)
if(n_ma > n)
ma <- ma[-(1:(n_ma - n))]
.keepx <<- varList[[1]][(n - .k + 1):n]
varList$x_ma <- ma
varList
},
transformObjects=list(.keepx=numeric(), .k=5), # k = window width
transformVars="x", # x = variable to get moving average for
transformPackages="zoo"))
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.