cao | R Documentation |
A constrained additive ordination (CAO) model is fitted using the reduced-rank vector generalized additive model (RR-VGAM) framework.
cao(formula, family = stop("argument 'family' needs to be assigned"),
data = list(),
weights = NULL, subset = NULL, na.action = na.fail,
etastart = NULL, mustart = NULL, coefstart = NULL,
control = cao.control(...), offset = NULL,
method = "cao.fit", model = FALSE, x.arg = TRUE, y.arg = TRUE,
contrasts = NULL, constraints = NULL,
extra = NULL, qr.arg = FALSE, smart = TRUE, ...)
formula |
a symbolic description of the model to be fit. The RHS of
the formula is used to construct the latent variables, upon
which the smooths are applied. All the variables in the
formula are used for the construction of latent variables
except for those specified by the argument |
family |
a function of class |
data |
an optional data frame containing the variables in
the model. By default the variables are taken from
|
weights |
an optional vector or matrix of (prior) weights to be used
in the fitting process. For |
subset |
an optional logical vector specifying a subset of observations to be used in the fitting process. |
na.action |
a function which indicates what should happen when
the data contain |
etastart |
starting values for the linear predictors. It is a
|
mustart |
starting values for the fitted values. It can be a vector
or a matrix. Some family functions do not make use of
this argument. For |
coefstart |
starting values for the coefficient vector. For |
control |
a list of parameters for controlling the fitting process.
See |
offset |
a vector or |
method |
the method to be used in fitting the model. The default
(and presently only) method |
model |
a logical value indicating whether the model frame
should be assigned in the |
x.arg , y.arg |
logical values indicating whether the model matrix and
response vector/matrix used in the fitting process should
be assigned in the |
contrasts |
an optional list. See the |
constraints |
an optional list of constraint matrices. For
|
extra |
an optional list with any extra information that might
be needed by the family function. For |
qr.arg |
For |
smart |
logical value indicating whether smart prediction
( |
... |
further arguments passed into |
The arguments of cao
are a mixture of those from
vgam
and cqo
, but with some extras
in cao.control
. Currently, not all of the
arguments work properly.
CAO can be loosely be thought of as the result of fitting
generalized additive models (GAMs) to several responses
(e.g., species) against a very small number of latent
variables. Each latent variable is a linear combination of
the explanatory variables; the coefficients C (called
C
below) are called constrained coefficients
or canonical coefficients, and are interpreted as
weights or loadings. The C are estimated by maximum
likelihood estimation. It is often a good idea to apply
scale
to each explanatory variable first.
For each response (e.g., species), each latent variable
is smoothed by a cubic smoothing spline, thus CAO is
data-driven. If each smooth were a quadratic then CAO
would simplify to constrained quadratic ordination
(CQO; formerly called canonical Gaussian ordination
or CGO). If each smooth were linear then CAO would simplify
to constrained linear ordination (CLO). CLO can
theoretically be fitted with cao
by specifying
df1.nl=0
, however it is more efficient to use
rrvglm
.
Currently, only Rank=1
is implemented, and only
noRRR = ~1
models are handled.
With binomial data, the default formula is
logit(P[Y_s=1]) = \eta_s = f_s(\nu), \ \ \ s=1,2,\ldots,S
where x_2
is a vector of environmental variables, and
\nu=C^T x_2
is a R
-vector of latent
variables. The \eta_s
is an additive predictor
for species s
, and it models the probabilities
of presence as an additive model on the logit scale.
The matrix C
is estimated from the data, as well as
the smooth functions f_s
. The argument noRRR =
~ 1
specifies that the vector x_1
, defined for
RR-VGLMs and QRR-VGLMs, is simply a 1 for an intercept. Here,
the intercept in the model is absorbed into the functions.
A clogloglink
link may be preferable over a
logitlink
link.
With Poisson count data, the formula is
\log(E[Y_s]) = \eta_s = f_s(\nu)
which models the mean response as an additive models on the log scale.
The fitted latent variables (site scores) are scaled to have
unit variance. The concept of a tolerance is undefined for
CAO models, but the optimums and maximums are defined. The
generic functions Max
and Opt
should work for CAO objects, but note that if the maximum
occurs at the boundary then Max
will return a
NA
. Inference for CAO models is currently undeveloped.
An object of class "cao"
(this may change to "rrvgam"
in the future).
Several generic functions can be applied to the object, e.g.,
Coef
, concoef
, lvplot
,
summary
.
CAO is very costly to compute. With version 0.7-8 it took 28 minutes on a fast machine. I hope to look at ways of speeding things up in the future.
Use set.seed
just prior to calling
cao()
to make your results reproducible. The reason
for this is finding the optimal CAO model presents a difficult
optimization problem, partly because the log-likelihood
function contains many local solutions. To obtain the
(global) solution the user is advised to try many
initial values. This can be done by setting Bestof
some appropriate value (see cao.control
). Trying
many initial values becomes progressively more important as
the nonlinear degrees of freedom of the smooths increase.
CAO models are computationally expensive, therefore setting
trace = TRUE
is a good idea, as well as running it
on a simple random sample of the data set instead.
Sometimes the IRLS algorithm does not converge within the FORTRAN code. This results in warnings being issued. In particular, if an error code of 3 is issued, then this indicates the IRLS algorithm has not converged. One possible remedy is to increase or decrease the nonlinear degrees of freedom so that the curves become more or less flexible, respectively.
T. W. Yee
Yee, T. W. (2006). Constrained additive ordination. Ecology, 87, 203–213.
cao.control
,
Coef.cao
,
cqo
,
latvar
,
Opt
,
Max
,
calibrate.qrrvglm
,
persp.cao
,
poissonff
,
binomialff
,
negbinomial
,
gamma2
,
set.seed
,
gam()
in gam,
trapO
.
## Not run:
hspider[, 1:6] <- scale(hspider[, 1:6]) # Stdzd environmental vars
set.seed(149) # For reproducible results
ap1 <- cao(cbind(Pardlugu, Pardmont, Pardnigr, Pardpull) ~
WaterCon + BareSand + FallTwig + CoveMoss + CoveHerb + ReflLux,
family = poissonff, data = hspider, Rank = 1,
df1.nl = c(Pardpull= 2.7, 2.5),
Bestof = 7, Crow1positive = FALSE)
sort(deviance(ap1, history = TRUE)) # A history of all the iterations
Coef(ap1)
concoef(ap1)
par(mfrow = c(2, 2))
plot(ap1) # All the curves are unimodal; some quite symmetric
par(mfrow = c(1, 1), las = 1)
index <- 1:ncol(depvar(ap1))
lvplot(ap1, lcol = index, pcol = index, y = TRUE)
trplot(ap1, label = TRUE, col = index)
abline(a = 0, b = 1, lty = 2)
trplot(ap1, label = TRUE, col = "blue", log = "xy", which.sp = c(1, 3))
abline(a = 0, b = 1, lty = 2)
persp(ap1, col = index, lwd = 2, label = TRUE)
abline(v = Opt(ap1), lty = 2, col = index)
abline(h = Max(ap1), lty = 2, col = index)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.