Description Usage Arguments Details Value Note Examples
Apply a function to each cell of a ragged array, that is to each (non-empty)
group of values given by a unique combination of the levels of certain factors
and, in contrast with tapply
, return within each cell a vector of the
same length as the cell, which are then ordered to match the positions of the cells
in the input.
capply
is very similar to ave
in package:stats
. They differ in
the way they treat missing values in the clustering variables. ave
treats missing values as if they were legitimate clustering levels while capply
returns a value of NA within any cluster formed by a combination of clustering variable values that includes a value of NA.
1 2 3 4 5 6 7 |
x |
a vector or data frame that provides the first argument of |
by |
If |
FUN |
a function to be applied to |
fmla |
in |
... |
additional variables to be supplied to |
capply
extends the function of tapply(x, by, FUN)[ tapply(x, by) ]
.
The function FUN
is applied to each cell of x
defined by each value of
by
. The result in each cell is recycled to a vector of the same length as the cell.
These vectors are then arranged to match the input x
. Thus, is the value returned within
each cell is a scalar, the effect of capply(x, by, FUN)
is the same as
tapply(x, by, FUN)[ tapply(x, by) ]
. capply
extends this use of tapply
by
allowing the the value returned within each cell to be a vector of the same size as the cell.
The capply.formula
method allow the use of two-sided formula of the form x ~ a + b
or cbind(x, y) ~ a + b
where the variables
on the left-hand side are used to create a data frame that is given as a first argument to FUN
. If there is a single variable on the left-hand side then
that variable can be treated as a vector by FUN
.
When the result in each cell is a scalar, capply
can be used to
for multilevel analysis to produce 'contextual variables' computed
within subgroups of the data and expanded to a constant over
elements of each subgroup.
capply( x , by, FUN , ...)
where x
is a vector
is equivalent to
unsplit ( lapply ( split ( x , by ), FUN, ...), by )
which has the same effect as
tapply( x, by, FUN, ...) [ tapply( x, by) ]
if FUN
returns a vector of length 1.
If FUN
returns a vector, it is recycled to the length of the input value.
When the first argument is a data frame:
capply ( dd, by, FUN, ...)
uses unsplit - lapply - split
to apply FUN
to each sub data frame. In this case, by
can
be a formula that is evaluated in 'dd'.
This syntax makes it easy to compute formulas involving more than one variable in 'dd'. An example:
capply( dd, ~gg, function(x) with( x, mean(Var1) / mean(Var2) ) )
where 'Var1' and 'Var2' are numeric variables and 'gg' a grouping factor in
data frame 'dd'. Or, using the with
function:
capply( dd, ~gg, with , mean(Var1) / mean(Var2) )
cvar
and cvars
are intended to create contextual variables in model formulas.
If 'x' is numerical, cvar
is equivalent to capply(x,id,mean)
and cvars
is equivalent to
capply(x,id,sum)
.
If x
is a factor, cvar
generates the
equivalent of a model matrix for the factor with indicators replaced by the proportion
within each cluster.
dvar
is equivalent to x - cvar(x,by)
and creates what is commonly known as
a version of 'x' that is 'centered within groups' (CWG). It creates the correct matrix
for a factor so that the between group interpretation of the effect of cvar(x,by)
is that of the 'between group' or 'compositional' effect of the factor.
capply
tends to be slow when there are many cells and by
is a factor. This may be due to the need to process all factor levels for each cell.
Turning by
into a numeric or character vector improves speed: e.g.
capply( x, as.numeric(by), FUN)
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | ##---- Should be DIRECTLY executable !! ----
##-- ==> Define data, use random,
##-- or do help(data=index) for the standard data sets.
data( hs )
head( hs )
# FUN returns a single value
hs$ses.mean <- capply( hs$ses, hs$school, mean, na.rm = T)
hs$ses.hetero <- capply ( hs$ses, hs$school, sd , na.rm = T)
hs.summ <- up( hs, ~school )
head( hs.summ ) # variables invariant within school
# FUN returns a vector
# with 'x' a data frame
# Note how the 'with' function provides an easy way to write use a
# formula as the '...' variable.
hs$minority.prop <- capply( hs, ~ school, with, mean( Minority == "Yes"))
# equivalently:
hs$minority.prop <- capply( hs$Minority, hs$school, mean)
# on very large data frames with many columns that are not used, the 'data frame'
# version of 'capply' can be very slow in comparison with 'vector' version.
# In contrast with 'tapply' 'FUN' can return a vector, e.g. ranks within groups
hs$mathach.rank <- capply( hs, ~ school, with , rank(mathach))
# cvar and dvar in multilevel models
library( nlme )
data ( hs )
fit <- lme( mathach ~ Minority * Sector, hs, random = ~ 1 | school)
summary ( fit )
fit.contextual <- lme( mathach ~ (Minority + cvar(Minority, school)) * Sector,
hs, random = ~ 1| school)
summary(fit.contextual) # contextual effect of cvar(Minority)
fit.compositional <- lme( mathach ~ (dvar(Minority,school) + cvar(Minority, school)) * Sector,
hs, random = ~ 1| school)
summary(fit.compositional) # compositional effect of cvar(Minority)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.