sparse-by: Apply a Function to a Data Frame split by levels of indices

sparsebyR Documentation

Apply a Function to a Data Frame split by levels of indices

Description

Function sparseby is a modified version of by for tapply applied to data frames. It always returns a new data frame rather than a multi-way array.

Usage

sparseby(data, INDICES = list(), FUN, ..., GROUPNAMES = TRUE)

Arguments

data

an R object, normally a data frame, possibly a matrix.

INDICES

a variable or list of variables indicating the subgroups of data

FUN

a function to be applied to data frame subsets of data.

...

further arguments to FUN.

GROUPNAMES

a logical variable indicating whether the group names should be bound to the result

Details

A data frame or matrix is split by row into data frames or matrices respectively subsetted by the values of one or more factors, and function FUN is applied to each subset in turn.

sparseby is much faster and more memory efficient than by or tapply in the situation where the combinations of INDICES present in the data form a sparse subset of all possible combinations.

Value

A data frame or matrix containing the results of FUN applied to each subgroup of the matrix. The result depends on what is returned from FUN:

If FUN returns NULL on any subsets, those are dropped.

If it returns a single value or a vector of values, the length must be consistent across all subgroups. These will be returned as values in rows of the resulting data frame or matrix.

If it returns data frames or matrices, they must all have the same number of columns, and they will be bound with rbind into a single data frame or matrix.

Names for the columns will be taken from the names in the list of INDICES or from the results of FUN, as appropriate.

Author(s)

Duncan Murdoch

See Also

tapply, by

Examples

x <- data.frame(index=c(rep(1,4),rep(2,3)),value=c(1:7))
x
sparseby(x,x$index,nrow)

# The version below works entirely in matrices
x <- as.matrix(x)
sparseby(x,list(group = x[,"index"]), function(subset) c(mean=mean(subset[,2])))

reshape documentation built on April 12, 2022, 5:07 p.m.

Related to sparse-by in reshape...