daply: Split data frame, apply function, and return results in an...

View source: R/daply.r

daplyR Documentation

Split data frame, apply function, and return results in an array.

Description

For each subset of data frame, apply function then combine results into an array. daply with a function that operates column-wise is similar to aggregate. To apply a function for each row, use aaply with .margins set to 1.

Usage

daply(
  .data,
  .variables,
  .fun = NULL,
  ...,
  .progress = "none",
  .inform = FALSE,
  .drop_i = TRUE,
  .drop_o = TRUE,
  .parallel = FALSE,
  .paropts = NULL
)

Arguments

.data

data frame to be processed

.variables

variables to split data frame by, as quoted variables, a formula or character vector

.fun

function to apply to each piece

...

other arguments passed on to .fun

.progress

name of the progress bar to use, see create_progress_bar

.inform

produce informative error messages? This is turned off by default because it substantially slows processing speed, but is very useful for debugging

.drop_i

should combinations of variables that do not appear in the input data be preserved (FALSE) or dropped (TRUE, default)

.drop_o

should extra dimensions of length 1 in the output be dropped, simplifying the output. Defaults to TRUE

.parallel

if TRUE, apply function in parallel, using parallel backend provided by foreach

.paropts

a list of additional options passed into the foreach function when parallel computation is enabled. This is important if (for example) your code relies on external data or packages: use the .export and .packages arguments to supply them so that all cluster nodes have the correct environment set up for computing.

Value

if results are atomic with same type and dimensionality, a vector, matrix or array; otherwise, a list-array (a list with dimensions)

Input

This function splits data frames by variables.

Output

If there are no results, then this function will return a vector of length 0 (vector()).

References

Hadley Wickham (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software, 40(1), 1-29. https://www.jstatsoft.org/v40/i01/.

See Also

Other array output: aaply(), laply(), maply()

Other data frame input: d_ply(), ddply(), dlply()

Examples

daply(baseball, .(year), nrow)

# Several different ways of summarising by variables that should not be
# included in the summary

daply(baseball[, c(2, 6:9)], .(year), colwise(mean))
daply(baseball[, 6:9], .(baseball$year), colwise(mean))
daply(baseball, .(year), function(df) colwise(mean)(df[, 6:9]))

hadley/plyr documentation built on Nov. 6, 2024, 5:54 p.m.