by-summary: Groupwise summary statistics
In doBy: Groupwise Statistics, LSmeans, Linear Estimates, Utilities

by-summary

R Documentation

Groupwise summary statistics

Description

Computes summary statistics by groups, similar to the summary procedure in SAS. A more flexible alternative to base R's aggregate.

Usage

summary_by(
  data,
  formula,
  id = NULL,
  FUN = mean,
  keep.names = FALSE,
  p2d = FALSE,
  order = TRUE,
  full.dimension = FALSE,
  var.names = NULL,
  fun.names = NULL,
  ...
)

summaryBy(
  formula,
  data = parent.frame(),
  id = NULL,
  FUN = mean,
  keep.names = FALSE,
  p2d = FALSE,
  order = TRUE,
  full.dimension = FALSE,
  var.names = NULL,
  fun.names = NULL,
  ...
)

Arguments

`data`	A data frame.
`formula`	A formula specifying response and grouping variables.
`id`	A formula indicating variables to retain (not grouped by).
`FUN`	A function or list of functions to apply to the response variables.
`keep.names`	Logical; keep original variable names if only one function is applied.
`p2d`	Replace parentheses in output names with dots?
`order`	Logical; should result be ordered by grouping variables?
`full.dimension`	Logical; if TRUE, repeat rows so output matches input size.
`var.names`	Optional custom names for response variables.
`fun.names`	Optional custom names for functions applied.
`...`	Additional arguments passed to functions in `FUN`.

Details

Extra arguments in ... are passed to all functions in FUN. If needed, wrap functions to handle these consistently (e.g., for na.rm = TRUE).

Value

A data frame of grouped summary statistics.

Author(s)

Søren Højsgaard, sorenh@math.aau.dk

Examples

data(CO2)

# Simple groupwise mean
summaryBy(uptake ~ Type + Treatment, data = CO2, FUN = mean)
summaryBy(cbind(uptake, conc) ~ Type + Treatment, data = CO2, FUN = mean)

# Compare with
aggregate(cbind(uptake, conc) ~ Type + Treatment, data = CO2, FUN = mean)

## Using '.' on the right hand side of a formula means to stratify by
## all variables not used elsewhere:
summaryBy(uptake ~ ., data = CO2, FUN = mean)

# Multiple functions using a custom summary function
myfun <- function(x, ...)
  c(m = mean(x, na.rm = TRUE), v = var(x, na.rm = TRUE), n = length(x))
summaryBy(uptake ~ Type + Treatment, data = CO2, FUN = myfun)

# Summary on transformed variables
# works:
summaryBy(cbind(lu=log(uptake), conc) ~ Type, data = CO2, FUN = mean)
# fails:
#summaryBy(cbind(log(uptake), conc) ~ Type, data = CO2, FUN = mean)

doBy documentation built on Dec. 2, 2025, 9:08 a.m.