up: Create a data frame at a higher level of aggregation
In gmonette/spida2: Collection of tools developed for the Summer Programme in Data Analysis 2000-2012

up	R Documentation

Create a data frame at a higher level of aggregation

Description

Produce a higher level data set with one row per cluster. The data set can contain only variables that are invariant in each cluster or it can also include summaries (mean or modes) of variables that vary by cluster. Adapted from gsummary in the nlme package.

Usage

up(
  object,
  form = formula(object),
  agg = NULL,
  sep.agg = "_",
  freq = NULL,
  sep.freq = "_",
  all = FALSE,
  sep = "/",
  na.rm = TRUE,
  FUN = function(x) mean(x, na.rm = na.rm),
  omitGroupingFactor = FALSE,
  groups,
  invariantsOnly = !all,
  ...
)

Arguments

`object`	a data frame to be aggregated.
`form`	a one-sided formula identifying the variable(s) in `object` that identifies clusters. e.g. ~ school/Sex to get a summary within each Sex of each school.
`agg`	(NEW: Aug 2016) a one-sided formula identifying variables to be aggregated, i.e. variables that vary within clusters and that need to be aggregated (within-cluster mean for numeric variables and within-cluster incidence proportions for factors). Default: NULL
`sep.agg`	(NEW: Aug 2016) separator between factor names and factor level for within-cluster incidence proportions. Default: '_'
`freq`	(NEW: Nov 2018) a one-sided formula identifying character variables to be represented according the frequencies of their levels, i.e. variables that vary withing cluster and that need to be aggregated (within-cluster sum for numeric variables and within-cluster frequencies for factors). Default: NULL
`sep.freq`	(NEW: Nove 2018) separator between factor names and factor level for within-cluster incidence frequencies. Default: '_'
`all`	if TRUE, include summaries of variables that vary within clusters, otherwise keep only cluster-invariant variables and variables listed in 'agg'
`sep`	separator to form cluster names combining more than one clustering variables. If the separator leads to the same name for distinct clusters (e.g. if var1 has levels 'a' and 'a/b' and var2 has levels 'b/c' and 'c') the function produces an error and a different separator should be used.
`FUN`	function to be used for summaries.
`omitGroupingFactor`	kept for compatibility with `gsummary`
`groups`	kept for compatibility with `gsummary`
`invariantsOnly`	kept for compatibility with `gsummary`
`...`	additional arguments to `tapply` when summarizing numerical variables. e.g. `na.rm = TRUE`

Details

up was created from nlme::gsummary and modified to make it easier to use and to make an equivalent of gsummary available when using lme4.

Value

a data frame with one row per combination of values of the variable(s) in form. The number of rows for each combination is retuned in a variable 'Freq'. Frequencies (proportions) of values for each variable specified by '~freq' ('~agg') are also included.

Author(s)

adapted by G. Monette from gsummary in 'nlme' by Bates & Pinheiro

Examples

    data(hs)
    dim( hs )
    hsu <- up( hs, ~ school )
    dim( hsu )

    # to also get cluster means of cluster-varying numeric variables and modes of factors:

    hsa <- up( hs, ~ school , all = TRUE )

    # to get summary proportions of cluster varying factors:

    up( cbind( hs, model.matrix( ~ Sex -1 , hs)), ~ school, all = T)

    # Similar using 'agg'
    
    up(hs, ~school, agg = ~ Sex)
    
    ## To plot a summary between-cluster panel along with within-cluster panels:

    hsu <- up( hs, ~ school, all = TRUE)
    hsu$school <- ' between'  # space to make it come lexicographically before cluster names

    require( lattice )
    xyplot( mathach ~ ses | school, rbind(hs,hsu),
        panel = function( x, y, ...) {
            panel.xyplot( x, y, ...)
            panel.lmline( x, y, ...)
        } )
        
    ## To create a data frame grouped by predictors with frequency variables for each
    ## level of a response variable for analysis with a binomial glm with goodness of fit
    ## based on the deviance
    
    hsa <- up( hs, ~school, freq = ~ Sex)
    head(hsa)
    fit <- glm(cbind(Sex_Female, Sex_Male) ~ Sector, hsa, family = binomial)
    summary(fit) # the residual deviance provides a goodness of fit test

gmonette/spida2 documentation built on June 12, 2025, 9:44 p.m.

gmonette/spida2 index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

gmonette/spida2
Collection of tools developed for the Summer Programme in Data Analysis 2000-2012

up: Create a data frame at a higher level of aggregation
In gmonette/spida2: Collection of tools developed for the Summer Programme in Data Analysis 2000-2012

Create a data frame at a higher level of aggregation

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to up in gmonette/spida2...

R Package Documentation

Browse R Packages

We want your feedback!

gmonette/spida2 Collection of tools developed for the Summer Programme in Data Analysis 2000-2012

up: Create a data frame at a higher level of aggregation In gmonette/spida2: Collection of tools developed for the Summer Programme in Data Analysis 2000-2012

Create a data frame at a higher level of aggregation

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to up in gmonette/spida2...

R Package Documentation

Browse R Packages

We want your feedback!

gmonette/spida2
Collection of tools developed for the Summer Programme in Data Analysis 2000-2012

up: Create a data frame at a higher level of aggregation
In gmonette/spida2: Collection of tools developed for the Summer Programme in Data Analysis 2000-2012