Description Usage Arguments Details Value Note Examples
Apply a function to each cell of a ragged array, that is to each (non-empty)
group of values given by a unique combination of the levels of certain
variables and, in contrast with tapply
, return within each cell a
vector of the same length as the cell, which are then ordered to match the
corresponding positions of the cells in the input.
1 |
x |
a vector or data frame that provides the first argument of
|
by |
If |
FUN |
a function to be applied to |
fmla |
in |
... |
additional variables to be supplied to |
capply
is very similar to ave
in package:stats
. They
differ in the way they treat missing values in the clustering variables.
ave
treats missing values as if they were legitimate clustering
levels while capply
returns a value of NA within any cluster formed
by a combination of clustering variable values that includes a value of NA.
capply
extends the function of tapply(x, by, FUN)[ tapply(x,
by) ]
. The function FUN
is applied to each cell of x
defined
by each value of by
. The result in each cell is recycled to a vector
of the same length as the cell. These vectors are then arranged to match the
input x
. Thus, is the value returned within each cell is a scalar,
the effect of capply(x, by, FUN)
is the same as tapply(x, by,
FUN)[ tapply(x, by) ]
. capply
extends this use of tapply
by
allowing the the value returned within each cell to be a vector of the same
length as the cell.
The capply.formula
method allows the use of two-sided formula of the
form x ~ a + b
or cbind(x, y) ~ a + b
where the variables on
the left-hand side are used to create a data frame that is given as a first
argument to FUN
. If there is a single variable on the left-hand side
then that variable can be treated as a vector by FUN
.
When the result in each cell is a scalar, capply
can be used
to for multilevel analysis to produce 'contextual variables' computed within
subgroups of the data and expanded to a constant over elements of each
subgroup.
capply( x , by, FUN , ...)
where x
is a vector
is equivalent to
unsplit ( lapply ( split ( x , by ), FUN, ...), by )
which has the same effect as
tapply( x, by, FUN, ...) [ tapply( x, by) ]
if FUN
returns a vector of length 1.
If FUN
returns a vector, it is recycled to the length of the input
value.
When the first argument is a data frame:
capply ( dd, by, FUN, ...)
uses unsplit - lapply - split to apply FUN
to each sub data frame. In
this case, by
can be a formula that is evaluated in 'dd'.
This syntax makes it easy to compute formulas involving more than one variable in 'dd'. An example:
capply( dd, ~gg, function(x) with( x, mean(Var1) / mean(Var2) ) )
where 'Var1' and 'Var2' are numeric variables and 'gg' a grouping factor in
data frame 'dd'. Or, using the with
function:
capply( dd, ~gg, with , mean(Var1) / mean(Var2) )
cvar
and cvars
are intended to create contextual variables in
model formulas. If 'x' is numerical, cvar
is equivalent to
capply(x,id,mean)
and cvars
is equivalent to
capply(x,id,sum)
.
If x
is a factor, cvar
generates the equivalent of a model
matrix for the factor with indicators replaced by the proportion within each
cluster.
dvar
is equivalent to x - cvar(x,by)
and creates what is
commonly known as a version of 'x' that is 'centered within groups' (CWG).
It creates the correct matrix for a factor so that the between group
interpretation of the effect of cvar(x,by)
is that of the 'between
group' or 'compositional' effect of the factor.
capply
tends to be slow when there are many cells and by
is a factor. This may be due to the need to process all factor levels for
each cell. Turning by
into a numeric or character vector improves
speed: e.g. capply( x, as.numeric(by), FUN)
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | ## Not run:
data( hs )
head( hs )
# FUN returns a single value
hs$ses.mean <- capply( hs$ses, hs$school, mean, na.rm = T)
hs$ses.hetero <- capply ( hs$ses, hs$school, sd , na.rm = T)
hs.summ <- up( hs, ~school )
head( hs.summ ) # variables invariant within school
# FUN returns a vector
# with 'x' a data frame
# Note how the 'with' function provides an easy way to write use a
# formula as the '...' variable.
hs$minority.prop <- capply( hs, ~ school, with, mean( Minority == "Yes"))
# equivalently:
hs$minority.prop <- capply( hs$Minority, hs$school, mean)
# on very large data frames with many columns that are not used, the 'data frame'
# version of 'capply' can be very slow in comparison with 'vector' version.
# In contrast with 'tapply' 'FUN' can return a vector, e.g. ranks within groups
hs$mathach.rank <- capply( hs, ~ school, with , rank(mathach))
# cvar and dvar in multilevel models
library( nlme )
data ( hs )
fit <- lme( mathach ~ Minority * Sector, hs, random = ~ 1 | school)
summary ( fit )
fit.contextual <- lme( mathach ~ (Minority + cvar(Minority, school)) * Sector,
hs, random = ~ 1| school)
summary(fit.contextual) # contextual effect of cvar(Minority)
fit.compositional <- lme( mathach ~ (dvar(Minority,school) + cvar(Minority, school)) * Sector,
hs, random = ~ 1| school)
summary(fit.compositional) # compositional effect of cvar(Minority)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.