capply: Apply a function within each cluster of multilevel data
In gmonette/yscs: Collection of tools used in York University's Statistical Consulting Service

Description Usage Arguments Details Value Note Examples

Apply a function to each cell of a ragged array, that is to each (non-empty) group of values given by a unique combination of the levels of certain variables and, in contrast with tapply, return within each cell a vector of the same length as the cell, which are then ordered to match the corresponding positions of the cells in the input.

1	capply(x, ...)

`x`	a vector or data frame that provides the first argument of `FUN`
`by`	If `x` is a vector: a 'factor' of the same lenth as `x` whose levels identify clusters. If `x` is a data frame, a one-sided formula that identifies the variable(s) within `x` to be used to clusters.
`FUN`	a function to be applied to `x` within each cluster. `FUN` can return a single value, or a vector whose length is equal to the number of elements in each cluster.
`fmla`	in `capply.formula`, fmla is a two-sided formula as in `aggregate.formula`. The left-hand side identifies the variable(s) in `data` to be include in a data.frame that is clusterd using the variables in the right-hand side of the formula.
`...`	additional variables to be supplied to `FUN`

capply is very similar to ave in package:stats. They differ in the way they treat missing values in the clustering variables. ave treats missing values as if they were legitimate clustering levels while capply returns a value of NA within any cluster formed by a combination of clustering variable values that includes a value of NA.

capply extends the function of tapply(x, by, FUN)[ tapply(x, by) ]. The function FUN is applied to each cell of x defined by each value of by. The result in each cell is recycled to a vector of the same length as the cell. These vectors are then arranged to match the input x. Thus, is the value returned within each cell is a scalar, the effect of capply(x, by, FUN) is the same as tapply(x, by, FUN)[ tapply(x, by) ]. capply extends this use of tapply by allowing the the value returned within each cell to be a vector of the same length as the cell.

The capply.formula method allows the use of two-sided formula of the form x ~ a + b or cbind(x, y) ~ a + b where the variables on the left-hand side are used to create a data frame that is given as a first argument to FUN. If there is a single variable on the left-hand side then that variable can be treated as a vector by FUN.

When the result in each cell is a scalar, capply can be used to for multilevel analysis to produce 'contextual variables' computed within subgroups of the data and expanded to a constant over elements of each subgroup.

capply( x , by, FUN , ...) where x is a vector

is equivalent to

unsplit ( lapply ( split ( x , by ), FUN, ...), by )

which has the same effect as

tapply( x, by, FUN, ...) [ tapply( x, by) ]

if FUN returns a vector of length 1.

If FUN returns a vector, it is recycled to the length of the input value.

When the first argument is a data frame:

capply ( dd, by, FUN, ...)

uses unsplit - lapply - split to apply FUN to each sub data frame. In this case, by can be a formula that is evaluated in 'dd'.

This syntax makes it easy to compute formulas involving more than one variable in 'dd'. An example:

capply( dd, ~gg, function(x) with( x, mean(Var1) / mean(Var2) ) )

where 'Var1' and 'Var2' are numeric variables and 'gg' a grouping factor in data frame 'dd'. Or, using the with function:

capply( dd, ~gg, with , mean(Var1) / mean(Var2) )

cvar and cvars are intended to create contextual variables in model formulas. If 'x' is numerical, cvar is equivalent to capply(x,id,mean) and cvars is equivalent to capply(x,id,sum).

If x is a factor, cvar generates the equivalent of a model matrix for the factor with indicators replaced by the proportion within each cluster.

dvar is equivalent to x - cvar(x,by) and creates what is commonly known as a version of 'x' that is 'centered within groups' (CWG). It creates the correct matrix for a factor so that the between group interpretation of the effect of cvar(x,by) is that of the 'between group' or 'compositional' effect of the factor.

capply tends to be slow when there are many cells and by is a factor. This may be due to the need to process all factor levels for each cell. Turning by into a numeric or character vector improves speed: e.g. capply( x, as.numeric(by), FUN).

## Not run: 
     data( hs )
     head( hs )

     # FUN returns a single value
     hs$ses.mean <- capply( hs$ses, hs$school, mean, na.rm = T)
     hs$ses.hetero <- capply ( hs$ses, hs$school, sd , na.rm = T)
     hs.summ <- up( hs, ~school )
     head( hs.summ )   # variables invariant within school

     # FUN returns a vector
     # with 'x' a data frame
     # Note how the 'with' function provides an easy way to write use a
     #   formula as the '...' variable.

     hs$minority.prop <- capply( hs, ~ school, with, mean( Minority == "Yes"))

     # equivalently:

     hs$minority.prop <- capply( hs$Minority, hs$school, mean)

     # on very large data frames with many columns that are not used, the 'data frame'
     # version of 'capply' can be very slow in comparison with 'vector' version.

     # In contrast with 'tapply' 'FUN' can return a vector, e.g. ranks within groups

     hs$mathach.rank <- capply( hs, ~ school, with , rank(mathach))

     # cvar and dvar in multilevel models

     library( nlme )
     data ( hs )
     fit <- lme( mathach ~ Minority * Sector, hs, random = ~ 1 | school)
     summary ( fit )

     fit.contextual <- lme( mathach ~ (Minority + cvar(Minority, school)) * Sector,
                       hs, random = ~ 1| school)
     summary(fit.contextual) # contextual effect of cvar(Minority)

     fit.compositional <- lme( mathach ~ (dvar(Minority,school) + cvar(Minority, school)) * Sector,
                       hs, random = ~ 1| school)
     summary(fit.compositional) # compositional effect of cvar(Minority)

## End(Not run)

gmonette/yscs documentation built on May 17, 2019, 7:28 a.m.

gmonette/yscs index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

gmonette/yscs
Collection of tools used in York University's Statistical Consulting Service

capply: Apply a function within each cluster of multilevel data
In gmonette/yscs: Collection of tools used in York University's Statistical Consulting Service

Description

Usage

Arguments

Details

Value

Note

Examples

Related to capply in gmonette/yscs...

R Package Documentation

Browse R Packages

We want your feedback!

gmonette/yscs Collection of tools used in York University's Statistical Consulting Service

capply: Apply a function within each cluster of multilevel data In gmonette/yscs: Collection of tools used in York University's Statistical Consulting Service

Description

Usage

Arguments

Details

Value

Note

Examples

Related to capply in gmonette/yscs...

R Package Documentation

Browse R Packages

We want your feedback!

gmonette/yscs
Collection of tools used in York University's Statistical Consulting Service

capply: Apply a function within each cluster of multilevel data
In gmonette/yscs: Collection of tools used in York University's Statistical Consulting Service