up: Create a data frame at a higher level of aggregation
In gmonette/WWCa: Tools for poststratification

Description Usage Arguments Details Value Author(s) Examples

Produce a higher level data set with one row per cluster. The data set can contain only variables that are invariant in each cluster or it can also include summaries (means and sums for numeric variables, and relative frequency or frequency matrices for factors) of variables that vary by cluster.

up(object, form = formula(object), agg = NULL, sum = NULL,
  sep.agg = "_", sep.sum = sep.agg, all = FALSE, sep = "/",
  na.rm = TRUE, FUN = function(x) mean(x, na.rm = na.rm),
  omitGroupingFactor = FALSE, groups, invariantsOnly = !all, ...)

`object`	a data frame to be aggregated.
`form`	a one-sided formula identifying the variable(s) in `object` that identifies clusters. e.g. ~ school/Sex to get a summary within each Sex of each school.
`agg`	(NEW: Aug 2016) a one-sided formula identifying variables to be aggregated, i.e. variables that vary within cluster and that need to be aggregated (within-cluster mean for numeric variables and within-cluster incidence proportions for factors). Default: NULL
`sum`	a one-sided formula identifying variables to be summed within clusters. Default: NULL
`sep.agg`	(NEW: Aug 2016) separator between factor names and factor level for within-cluster incidence proportions. Default: '_'
`sep.sum,`	like 'sep.agg' but for 'sum' variables.
`all`	if TRUE, include summaries of variables that vary within clusters, otherwise keep only cluster-invariant variables and variables listed in 'agg'
`sep`	separator to form cluster names combining more than one clustering variables. If the separator leads to the same name for distinct clusters (e.g. if var1 has levels 'a' and 'a/b' and var2 has levels 'b/c' and 'c') the function produces an error and a different separator should be used.
`FUN`	function to be used for summaries.
`omitGroupingFactor`	kept for compatibility with `gsummary`
`groups`	kept for compatibility with `gsummary`
`invariantsOnly`	kept for compatibility with `gsummary`
`...`	additional arguments to `tapply` when summarizing numerical variables. e.g. `na.rm = TRUE`

up(data, by) keeps rows corresponding to unique values of the clusters defined by the variable or list of variables in 'by'.

'by' can also be a formula, evaluated in 'data'. For example, by = ~ a + b, is equivalent to by = data[,c('a','b')]. For example, with 'data' used above:

> data.state <- up(data, ~ state + sex)

creates a data frame with one row per state x sex combination.

> data.state <- up(data, ~ state + sex, sum = ~ population)

will take an existing variable, population, in 'data' and sum it over rows within each cluster to form the variable 'population' in 'data.state'.

up was created from nlme::gsummary and modified to make it easier to use and to make an equivalent of gsummary available when using lme4. The agg and sum arguments were added later and provide easy aggreatation of survey data.

a data frame with one row per value of the variable in form

Georges Monette, 'all' is largely from gsummary in Bates & Pinheiro

    data(hs)
    dim( hs )
    hsu <- up( hs, ~ school )
    dim( hsu )

    # to also get cluster means of cluster-varying numeric variables and modes of factors:

    hsa <- up( hs, ~ school , all = TRUE )

    # to get summary proportions of cluster varying factors:

    up( cbind( hs, model.matrix( ~ Sex -1 , hs)), ~ school, all = T)


    ## To plot a summary between-cluster panel along with within-cluster panels:

    hsu <- up( hs, ~ school, all = TRUE)
    hsu$school <- ' between'  # space to make it come lexicographically before cluster names

    require( lattice )
    xyplot( mathach ~ ses | school, rbind(hs,hsu),
        panel = function( x, y, ...) {
            panel.xyplot( x, y, ...)
            panel.lmline( x, y, ...)
        } )