capply | R Documentation |
Apply a function to each cell of a ragged array, that is to each (non-empty)
group of values given by a unique combination of the levels of certain
variables and, in contrast with tapply
, return within each cell a
vector of the same length as the cell, which are then ordered to match the
corresponding positions of the cells in the input.
capply(x, ...)
## S3 method for class 'formula'
capply(formula, data, FUN, ...)
## Default S3 method:
capply(x, by, FUN, ..., sep = "#^#")
x |
a vector or data frame that provides the first argument of
|
... |
additional variables to be supplied to |
FUN |
a function to be applied to |
by |
If |
fmla |
in |
capply
is very similar to ave
in package:stats
. They
differ in the way they treat missing values in the clustering variables.
ave
treats missing values as if they were legitimate clustering
levels while capply
returns a value of NA within any cluster formed
by a combination of clustering variable values that includes a value of NA.
capply
extends the function of tapply(x, by, FUN)[ tapply(x,
by) ]
. The function FUN
is applied to each cell of x
defined
by each value of by
. The result in each cell is recycled to a vector
of the same length as the cell. These vectors are then arranged to match the
input x
. Thus, if the value returned within each cell is a scalar,
the effect of capply(x, by, FUN)
is the same as tapply(x, by,
FUN)[ tapply(x, by) ]
. capply
extends this use of tapply
by
allowing the value returned within each cell to be a vector of the same
length as the cell.
The capply.formula
method allows the use of two-sided formula of the
form x ~ a + b
or cbind(x, y) ~ a + b
where the variables on
the left-hand side are used to create a data frame that is given as a first
argument to FUN
. If there is a single variable on the left-hand side
then that variable can be treated as a vector by FUN
.
When the result in each cell is a scalar, capply
can be used
to for multilevel analysis to produce 'contextual variables' computed within
subgroups of the data and expanded to a constant over elements of each
subgroup.
capply( x , by, FUN , ...)
where x
is a vector
is equivalent to
unsplit ( lapply ( split ( x , by ), FUN, ...), by )
which has the same effect as
tapply( x, by, FUN, ...) [ tapply( x, by) ]
if FUN
returns a vector of length 1.
If FUN
returns a vector, it is recycled to the length of the input
value.
When the first argument is a data frame:
capply ( dd, by, FUN, ...)
uses unsplit - lapply - split to apply FUN
to each sub data frame. In
this case, by
can be a formula that is evaluated in 'dd'.
This syntax makes it easy to compute formulas involving more than one variable in 'dd'. An example:
capply( dd, ~gg, function(x) with( x, mean(Var1) / mean(Var2) ) )
where 'Var1' and 'Var2' are numeric variables and 'gg' a grouping factor in
data frame 'dd'. Or, using the with
function:
capply( dd, ~gg, with , mean(Var1) / mean(Var2) )
cvar
and cvars
are intended to create contextual variables in
model formulas. If 'x' is numerical, cvar
is equivalent to
capply(x,id,mean)
and cvars
is equivalent to
capply(x,id,sum)
.
If x
is a factor, cvar
generates the equivalent of a model
matrix for the factor with indicators replaced by the proportion within each
cluster.
dvar
is equivalent to x - cvar(x,by)
and creates what is
commonly known as a version of 'x' that is 'centered within groups' (CWG).
It creates the correct matrix for a factor so that the between group
interpretation of the effect of cvar(x,by)
is that of the 'between
group' or 'compositional' effect of the factor.
capply(formula)
: method for class 'formula'
capply(default)
: default method
capply
tends to be slow when there are many cells and by
is a factor. This may be due to the need to process all factor levels for
each cell. Turning by
into a numeric or character vector improves
speed: e.g. capply( x, as.numeric(by), FUN)
.
## Not run:
data( hs )
head( hs )
# FUN returns a single value
hs$ses.mean <- capply( hs$ses, hs$school, mean, na.rm = T)
hs$ses.hetero <- capply ( hs$ses, hs$school, sd , na.rm = T)
hs.summ <- up( hs, ~school )
head( hs.summ ) # variables invariant within school
# FUN returns a vector
# with 'x' a data frame
# Note how the 'with' function provides an easy way to write use a
# formula as the '...' variable.
hs$minority.prop <- capply( hs, ~ school, with, mean( Minority == "Yes"))
# equivalently:
hs$minority.prop <- capply( hs$Minority, hs$school, mean)
# on very large data frames with many columns that are not used, the 'data frame'
# version of 'capply' can be very slow in comparison with 'vector' version.
# In contrast with 'tapply' 'FUN' can return a vector, e.g. ranks within groups
hs$mathach.rank <- capply( hs, ~ school, with , rank(mathach))
# cvar and dvar in multilevel models
library( nlme )
data ( hs )
fit <- lme( mathach ~ Minority * Sector, hs, random = ~ 1 | school)
summary ( fit )
fit.contextual <- lme( mathach ~ (Minority + cvar(Minority, school)) * Sector,
hs, random = ~ 1| school)
summary(fit.contextual) # contextual effect of cvar(Minority)
fit.compositional <- lme( mathach ~ (dvar(Minority,school) + cvar(Minority, school)) * Sector,
hs, random = ~ 1| school)
summary(fit.compositional) # compositional effect of cvar(Minority)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.