by-summary: Groupwise summary statistics

by-summaryR Documentation

Groupwise summary statistics

Description

Computes summary statistics by groups, similar to the summary procedure in SAS. A more flexible alternative to base R's aggregate.

Usage

summary_by(
  data,
  formula,
  id = NULL,
  FUN = mean,
  keep.names = FALSE,
  p2d = FALSE,
  order = TRUE,
  full.dimension = FALSE,
  var.names = NULL,
  fun.names = NULL,
  ...
)

summaryBy(
  formula,
  data = parent.frame(),
  id = NULL,
  FUN = mean,
  keep.names = FALSE,
  p2d = FALSE,
  order = TRUE,
  full.dimension = FALSE,
  var.names = NULL,
  fun.names = NULL,
  ...
)

Arguments

data

A data frame.

formula

A formula specifying response and grouping variables.

id

A formula indicating variables to retain (not grouped by).

FUN

A function or list of functions to apply to the response variables.

keep.names

Logical; keep original variable names if only one function is applied.

p2d

Replace parentheses in output names with dots?

order

Logical; should result be ordered by grouping variables?

full.dimension

Logical; if TRUE, repeat rows so output matches input size.

var.names

Optional custom names for response variables.

fun.names

Optional custom names for functions applied.

...

Additional arguments passed to functions in FUN.

Details

Extra arguments in ... are passed to all functions in FUN. If needed, wrap functions to handle these consistently (e.g., for na.rm = TRUE).

Value

A data frame of grouped summary statistics.

Author(s)

Søren Højsgaard, sorenh@math.aau.dk

See Also

aggregate, orderBy, transformBy, splitBy

Examples

data(CO2)

# Simple groupwise mean
summaryBy(uptake ~ Type + Treatment, data = CO2, FUN = mean)
summaryBy(cbind(uptake, conc) ~ Type + Treatment, data = CO2, FUN = mean)

# Compare with
aggregate(cbind(uptake, conc) ~ Type + Treatment, data = CO2, FUN = mean)

## Using '.' on the right hand side of a formula means to stratify by
## all variables not used elsewhere:
summaryBy(uptake ~ ., data = CO2, FUN = mean)

# Multiple functions using a custom summary function
myfun <- function(x, ...)
  c(m = mean(x, na.rm = TRUE), v = var(x, na.rm = TRUE), n = length(x))
summaryBy(uptake ~ Type + Treatment, data = CO2, FUN = myfun)

# Summary on transformed variables
# works:
summaryBy(cbind(lu=log(uptake), conc) ~ Type, data = CO2, FUN = mean)
# fails:
#summaryBy(cbind(log(uptake), conc) ~ Type, data = CO2, FUN = mean)

doBy documentation built on June 30, 2025, 1:06 a.m.