sparse-by: Apply a Function to a Data Frame split by levels of indices
In reshape: Flexibly Reshape Data

sparseby

R Documentation

Apply a Function to a Data Frame split by levels of indices

Description

Function sparseby is a modified version of by for tapply applied to data frames. It always returns a new data frame rather than a multi-way array.

Usage

sparseby(data, INDICES = list(), FUN, ..., GROUPNAMES = TRUE)

Arguments

`data`	an R object, normally a data frame, possibly a matrix.
`INDICES`	a variable or list of variables indicating the subgroups of `data`
`FUN`	a function to be applied to data frame subsets of `data`.
`...`	further arguments to `FUN`.
`GROUPNAMES`	a logical variable indicating whether the group names should be bound to the result

Details

A data frame or matrix is split by row into data frames or matrices respectively subsetted by the values of one or more factors, and function FUN is applied to each subset in turn.

sparseby is much faster and more memory efficient than by or tapply in the situation where the combinations of INDICES present in the data form a sparse subset of all possible combinations.

Value

A data frame or matrix containing the results of FUN applied to each subgroup of the matrix. The result depends on what is returned from FUN:

If FUN returns NULL on any subsets, those are dropped.

If it returns a single value or a vector of values, the length must be consistent across all subgroups. These will be returned as values in rows of the resulting data frame or matrix.

If it returns data frames or matrices, they must all have the same number of columns, and they will be bound with rbind into a single data frame or matrix.

Names for the columns will be taken from the names in the list of INDICES or from the results of FUN, as appropriate.

Author(s)

Duncan Murdoch

Examples

x <- data.frame(index=c(rep(1,4),rep(2,3)),value=c(1:7))
x
sparseby(x,x$index,nrow)

# The version below works entirely in matrices
x <- as.matrix(x)
sparseby(x,list(group = x[,"index"]), function(subset) c(mean=mean(subset[,2])))

reshape documentation built on June 19, 2025, 5:08 p.m.