capply: Apply a function within each cluster of multilevel data

Description Usage Arguments Details Value Note Examples

Description

Apply a function to each cell of a ragged array, that is to each (non-empty) group of values given by a unique combination of the levels of certain factors and, in contrast with tapply, return within each cell a vector of the same length as the cell, which are then ordered to match the positions of the cells in the input.

capply is very similar to ave in package:stats. They differ in the way they treat missing values in the clustering variables. ave treats missing values as if they were legitimate clustering levels while capply returns a value of NA within any cluster formed by a combination of clustering variable values that includes a value of NA.

Usage

1
2
3
4
5
6
7
capply( x, by, FUN, ...)

capply.formula( fmla, data,  FUN, ... )

cvar( x, by, ... )

dvar( x, by, ... )

Arguments

x

a vector or data frame that provides the first argument of FUN

by

If x is a vector: a 'factor' of the same lenth as x whose levels identify clusters. If x is a data frame, a one-sided formula that identifies the variable(s) within x to be used to clusters.

FUN

a function to be applied to x within each cluster. FUN can return a single value, or a vector whose length is equal to the number of elements in each cluster.

fmla

in capply.formula, codefmla is a two-sided formula as in aggregate.formula. The left-hand side identifies the variable(s) in data to be include in a data.frame that clusterd using the the variables in the right-hand side of the formula.

...

additional variables to be supplied to FUN

Details

capply extends the function of tapply(x, by, FUN)[ tapply(x, by) ]. The function FUN is applied to each cell of x defined by each value of by. The result in each cell is recycled to a vector of the same length as the cell. These vectors are then arranged to match the input x. Thus, is the value returned within each cell is a scalar, the effect of capply(x, by, FUN) is the same as tapply(x, by, FUN)[ tapply(x, by) ]. capply extends this use of tapply by allowing the the value returned within each cell to be a vector of the same size as the cell.

The capply.formula method allow the use of two-sided formula of the form x ~ a + b or cbind(x, y) ~ a + b where the variables on the left-hand side are used to create a data frame that is given as a first argument to FUN. If there is a single variable on the left-hand side then that variable can be treated as a vector by FUN.

Value

When the result in each cell is a scalar, capply can be used to for multilevel analysis to produce 'contextual variables' computed within subgroups of the data and expanded to a constant over elements of each subgroup.

capply( x , by, FUN , ...) where x is a vector

is equivalent to

unsplit ( lapply ( split ( x , by ), FUN, ...), by )

which has the same effect as

tapply( x, by, FUN, ...) [ tapply( x, by) ]

if FUN returns a vector of length 1.

If FUN returns a vector, it is recycled to the length of the input value.

When the first argument is a data frame:

capply ( dd, by, FUN, ...)

uses unsplit - lapply - split to apply FUN to each sub data frame. In this case, by can be a formula that is evaluated in 'dd'.

This syntax makes it easy to compute formulas involving more than one variable in 'dd'. An example:

capply( dd, ~gg, function(x) with( x, mean(Var1) / mean(Var2) ) )

where 'Var1' and 'Var2' are numeric variables and 'gg' a grouping factor in data frame 'dd'. Or, using the with function:

capply( dd, ~gg, with , mean(Var1) / mean(Var2) )

cvar and cvars are intended to create contextual variables in model formulas. If 'x' is numerical, cvar is equivalent to capply(x,id,mean) and cvars is equivalent to capply(x,id,sum).

If x is a factor, cvar generates the equivalent of a model matrix for the factor with indicators replaced by the proportion within each cluster.

dvar is equivalent to x - cvar(x,by) and creates what is commonly known as a version of 'x' that is 'centered within groups' (CWG). It creates the correct matrix for a factor so that the between group interpretation of the effect of cvar(x,by) is that of the 'between group' or 'compositional' effect of the factor.

Note

capply tends to be slow when there are many cells and by is a factor. This may be due to the need to process all factor levels for each cell. Turning by into a numeric or character vector improves speed: e.g. capply( x, as.numeric(by), FUN).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
##---- Should be DIRECTLY executable !! ----
##-- ==>  Define data, use random,
##--	or do  help(data=index)  for the standard data sets.

     data( hs )
     head( hs )

     # FUN returns a single value
     hs$ses.mean <- capply( hs$ses, hs$school, mean, na.rm = T)
     hs$ses.hetero <- capply ( hs$ses, hs$school, sd , na.rm = T)
     hs.summ <- up( hs, ~school )
     head( hs.summ )   # variables invariant within school
     
     # FUN returns a vector
     # with 'x' a data frame
     # Note how the 'with' function provides an easy way to write use a
     #   formula as the '...' variable.
     
     hs$minority.prop <- capply( hs, ~ school, with, mean( Minority == "Yes"))
     
     # equivalently:
     
     hs$minority.prop <- capply( hs$Minority, hs$school, mean)
     
     # on very large data frames with many columns that are not used, the 'data frame'
     # version of 'capply' can be very slow in comparison with 'vector' version.
     
     # In contrast with 'tapply' 'FUN' can return a vector, e.g. ranks within groups
     
     hs$mathach.rank <- capply( hs, ~ school, with , rank(mathach))
     
     # cvar and dvar in multilevel models
     
     library( nlme )
     data ( hs )
     fit <- lme( mathach ~ Minority * Sector, hs, random = ~ 1 | school)
     summary ( fit )
     
     fit.contextual <- lme( mathach ~ (Minority + cvar(Minority, school)) * Sector,
                       hs, random = ~ 1| school)
     summary(fit.contextual) # contextual effect of cvar(Minority)
     
     fit.compositional <- lme( mathach ~ (dvar(Minority,school) + cvar(Minority, school)) * Sector,
                       hs, random = ~ 1| school)
     summary(fit.compositional) # compositional effect of cvar(Minority)

gmonette/spida documentation built on May 17, 2019, 7:25 a.m.