summarise: Summarise multiple values to a single value
In RevolutionAnalytics/dplyrXdf: Tools for working with Microsoft R Server Xdf files and the dplyr package

Description Usage Arguments Details Value See Also Examples

Summarise multiple values to a single value

## S3 method for class 'RxFileData'
summarise(.data, ..., .outFile = tbl_xdf(.data), .rxArgs,
  .method = NULL)

## S3 method for class 'RxDataSource'
summarise(.data, ...)

`.data`	A tbl for an Xdf data source; or a raw Xdf data source.
`...`	Name-value pairs of summary functions like `min()`, `mean()`, `max()` etc.
`.outFile`	Output format for the returned data. If not supplied, create an xdf tbl; if `NULL`, return a data frame; if a character string naming a file, save an Xdf file at that location.
`.rxArgs`	A list of RevoScaleR arguments. See `rxArgs` for details.

There are 5 possible methods for doing the summarisation. To choose which method is used, specify a .method argument in the call to summarise, with a number from 1 to 5.

use rxCube, cbind data frames together: only n(), mean(), sum() supported, grouped data only (fast)
use rxSummary, cbind data frames together: stats in rxSummary supported (fast)
as 2), but build classification levels by pasting the grouping variable(s) together (moderately fast)
split into multiple Xdfs by group, run dplyr::summarise on each, rbind xdfs together: arbitrary stats supported (slow)
split into multiple Xdfs by group, run rxSummary on each, rbind xdfs together: stats in rxSummary supported (slowest, most scalable)

The default method is 1 if the data is grouped and the requested summary statistics are supported by rxCube; otherwise 2 if the requested statistics are supported by rxSummary; otherwise 4. Method 3 is supplied for the case where the product of factor levels for the grouping variables exceeds 2^32 - 1, a known limitation of rxCube and rxSummary.

Supplying custom functions to summarise is supported, but they must be named functions (and will automatically cause .method=4 to be selected). Anonymous functions will cause an error.

Due to limitations in RevoScaleR support for HDFS, you should take note of the following:

The result of the summarise will be streamed to the client (either the edge node or a remote client) before being written back to HDFS.
If summarising over character grouping variables, it may be faster to specify .method=4 or 5. This is because the usual summarise functions, rxSummary and rxCube, require factor or numeric groups, and converting character to factor can be slow for HDFS data.

An object representing the summary. This depends on the .outFile argument: if missing, it will be an xdf tbl object; if NULL, a data frame; and if a filename, an Xdf data source referencing a file saved to that location.

summarise in package dplyr, rxCube, rxSummary

mtx <- as_xdf(mtcars, overwrite=TRUE)

tbl <- summarise(mtx, m=mean(mpg))
as.data.frame(tbl)

tbl2 <- group_by(mtx, cyl) %>% summarise(m=mean(mpg))
as.data.frame(tbl2)

# filter and summarise simultaneously with .rxArgs
tbl3 <- summarise(mtx, m=mean(mpg), .rxArgs=list(rowSelection=cyl > 4))
as.data.frame(tbl3)

# compute a weighted mean
tbl4 <- summarise(mtx, m=mean(mpg), .rxArgs=list(pweights="wt"))
as.data.frame(tbl4)

# save to a persistent Xdf file
summarise(mtx, m=mean(mpg), .outFile="mtcars_summary.xdf")

RevolutionAnalytics/dplyrXdf documentation built on June 3, 2019, 9:08 p.m.

RevolutionAnalytics/dplyrXdf index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

RevolutionAnalytics/dplyrXdf
Tools for working with Microsoft R Server Xdf files and the dplyr package

summarise: Summarise multiple values to a single value
In RevolutionAnalytics/dplyrXdf: Tools for working with Microsoft R Server Xdf files and the dplyr package

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to summarise in RevolutionAnalytics/dplyrXdf...

R Package Documentation

Browse R Packages

We want your feedback!

RevolutionAnalytics/dplyrXdf Tools for working with Microsoft R Server Xdf files and the dplyr package

summarise: Summarise multiple values to a single value In RevolutionAnalytics/dplyrXdf: Tools for working with Microsoft R Server Xdf files and the dplyr package

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to summarise in RevolutionAnalytics/dplyrXdf...

R Package Documentation

Browse R Packages

We want your feedback!

RevolutionAnalytics/dplyrXdf
Tools for working with Microsoft R Server Xdf files and the dplyr package

summarise: Summarise multiple values to a single value
In RevolutionAnalytics/dplyrXdf: Tools for working with Microsoft R Server Xdf files and the dplyr package