agg: Create a data frame at a higher level of aggregation with a...

aggR Documentation

Create a data frame at a higher level of aggregation with a possible incidence matrix for categorical factors

Description

Produce a higher level data set with one row per cluster. The data set contains variables that are invariant in each cluster and, optionally, summaries of other variables (means for numeric variables and blocks of variables corresponding to incidence matrices for factors. Adapted from gsummary in the nlme package and from up in the spida2 package.

Usage

agg(
  object,
  form = formula(object),
  agg = NULL,
  sep = "_",
  sep.clus = "/",
  na.rm = TRUE,
  ...
)

Arguments

object

a data frame to be aggregated.

form

a one-sided formula or a list or data frame identifying the variable(s) in object that identifies clusters. e.g. ~ school/Sex to get a summary within each Sex of each school.

agg

a one-sided formula or a list or data frame identifying the variable(s) in object to be aggregated.

sep

(default _) separator to separate variable name from variable value when aggregating a factor variable.

sep.clust

(default /) separator to form cluster names combining more than one clustering variables. If the separator leads to the same name for distinct clusters (e.g. if var1 has levels 'a' and 'a/b' and var2 has levels 'b/c' and 'c') the function produces an error and a different separator should be used.

FUN

(default cvar) function to be used for summaries.

Details

The function of 'agg' have been incorporated into the up function.

Value

a data frame with one row per value of the variable in form and aggregate variabless for each variable name in agg

Author(s)

largely adapted from gsummary in Bates & Pinheiro

Examples

# a labor force survey with individual level data
surv <- read.table(header = TRUE, text = "
year sex region status
2010   M      A employed
2011   F      B unemployed
2012   M      C employed
2010   F      B employed
2011   F      A employed
2012   M      A out_of_labor_force
2010   F      A employed
2011   F      C employed
2012   M      C out_of_labor_force
2010   M      A employed
2011   M      A unemployed
2012   M      C employed
2010   M      A employed
2011   M      C unemployed
2012   M      C employed
2010   M      A employed
2011   F      B unemployed
2012   F      A unemployed
")
surv
agg(surv, )
agg(surv, ~ year, ~ sex + region)


gmonette/spida2 documentation built on July 14, 2024, 12:45 p.m.