cao: Fitting Constrained Additive Ordination (CAO)
In VGAM: Vector Generalized Linear and Additive Models

View source: R/cao.R

cao	R Documentation

Fitting Constrained Additive Ordination (CAO)

Description

A constrained additive ordination (CAO) model is fitted using the reduced-rank vector generalized additive model (RR-VGAM) framework.

Usage

cao(formula, family = stop("argument 'family' needs to be assigned"),
    data = list(),
    weights = NULL, subset = NULL, na.action = na.fail,
    etastart = NULL, mustart = NULL, coefstart = NULL,
    control = cao.control(...), offset = NULL,
    method = "cao.fit", model = FALSE, x.arg = TRUE, y.arg = TRUE,
    contrasts = NULL, constraints = NULL,
    extra = NULL, qr.arg = FALSE, smart = TRUE, ...)

Arguments

`formula`	a symbolic description of the model to be fit. The RHS of the formula is used to construct the latent variables, upon which the smooths are applied. All the variables in the formula are used for the construction of latent variables except for those specified by the argument `noRRR`, which is itself a formula. The LHS of the formula contains the response variables, which should be a matrix with each column being a response (species).
`family`	a function of class `"vglmff"` (see `vglmff-class`) describing what statistical model is to be fitted. This is called a “VGAM family function”. See `CommonVGAMffArguments` for general information about many types of arguments found in this type of function. See `cqo` for a list of those presently implemented.
`data`	an optional data frame containing the variables in the model. By default the variables are taken from `environment(formula)`, typically the environment from which `cao` is called.
`weights`	an optional vector or matrix of (prior) weights to be used in the fitting process. For `cao`, this argument currently should not be used.
`subset`	an optional logical vector specifying a subset of observations to be used in the fitting process.
`na.action`	a function which indicates what should happen when the data contain `NA`s. The default is set by the `na.action` setting of `options`, and is `na.fail` if that is unset. The “factory-fresh” default is `na.omit`.
`etastart`	starting values for the linear predictors. It is a `M`-column matrix. If `M=1` then it may be a vector. For `cao`, this argument currently should not be used.
`mustart`	starting values for the fitted values. It can be a vector or a matrix. Some family functions do not make use of this argument. For `cao`, this argument currently should not be used.
`coefstart`	starting values for the coefficient vector. For `cao`, this argument currently should not be used.
`control`	a list of parameters for controlling the fitting process. See `cao.control` for details.
`offset`	a vector or `M`-column matrix of offset values. These are a priori known and are added to the linear predictors during fitting. For `cao`, this argument currently should not be used.
`method`	the method to be used in fitting the model. The default (and presently only) method `cao.fit` uses iteratively reweighted least squares (IRLS) within FORTRAN code called from `optim`.
`model`	a logical value indicating whether the model frame should be assigned in the `model` slot.
`x.arg`, `y.arg`	logical values indicating whether the model matrix and response vector/matrix used in the fitting process should be assigned in the `x` and `y` slots. Note the model matrix is the linear model (LM) matrix.
`contrasts`	an optional list. See the `contrasts.arg` of `model.matrix.default`.
`constraints`	an optional list of constraint matrices. For `cao`, this argument currently should not be used. The components of the list must be named with the term it corresponds to (and it must match in character format). Each constraint matrix must have `M` rows, and be of full-column rank. By default, constraint matrices are the `M` by `M` identity matrix unless arguments in the family function itself override these values. If `constraints` is used it must contain all the terms; an incomplete list is not accepted.
`extra`	an optional list with any extra information that might be needed by the family function. For `cao`, this argument currently should not be used.
`qr.arg`	For `cao`, this argument currently should not be used.
`smart`	logical value indicating whether smart prediction (`smartpred`) will be used.
`...`	further arguments passed into `cao.control`.

Details

The arguments of cao are a mixture of those from vgam and cqo, but with some extras in cao.control. Currently, not all of the arguments work properly.

CAO can be loosely be thought of as the result of fitting generalized additive models (GAMs) to several responses (e.g., species) against a very small number of latent variables. Each latent variable is a linear combination of the explanatory variables; the coefficients C (called C below) are called constrained coefficients or canonical coefficients, and are interpreted as weights or loadings. The C are estimated by maximum likelihood estimation. It is often a good idea to apply scale to each explanatory variable first.

For each response (e.g., species), each latent variable is smoothed by a cubic smoothing spline, thus CAO is data-driven. If each smooth were a quadratic then CAO would simplify to constrained quadratic ordination (CQO; formerly called canonical Gaussian ordination or CGO). If each smooth were linear then CAO would simplify to constrained linear ordination (CLO). CLO can theoretically be fitted with cao by specifying df1.nl=0, however it is more efficient to use rrvglm.

Currently, only Rank=1 is implemented, and only noRRR = ~1 models are handled.

With binomial data, the default formula is

logit(P[Y_s=1]) = \eta_s = f_s(\nu), \ \ \ s=1,2,\ldots,S

where x_2 is a vector of environmental variables, and \nu=C^T x_2 is a R-vector of latent variables. The \eta_s is an additive predictor for species s, and it models the probabilities of presence as an additive model on the logit scale. The matrix C is estimated from the data, as well as the smooth functions f_s. The argument noRRR = ~ 1 specifies that the vector x_1, defined for RR-VGLMs and QRR-VGLMs, is simply a 1 for an intercept. Here, the intercept in the model is absorbed into the functions. A clogloglink link may be preferable over a logitlink link.

With Poisson count data, the formula is

\log(E[Y_s]) = \eta_s = f_s(\nu)

which models the mean response as an additive models on the log scale.

The fitted latent variables (site scores) are scaled to have unit variance. The concept of a tolerance is undefined for CAO models, but the optimums and maximums are defined. The generic functions Max and Opt should work for CAO objects, but note that if the maximum occurs at the boundary then Max will return a NA. Inference for CAO models is currently undeveloped.

Value

An object of class "cao" (this may change to "rrvgam" in the future). Several generic functions can be applied to the object, e.g., Coef, concoef, lvplot, summary.

Warning

CAO is very costly to compute. With version 0.7-8 it took 28 minutes on a fast machine. I hope to look at ways of speeding things up in the future.

Use set.seed just prior to calling cao() to make your results reproducible. The reason for this is finding the optimal CAO model presents a difficult optimization problem, partly because the log-likelihood function contains many local solutions. To obtain the (global) solution the user is advised to try many initial values. This can be done by setting Bestof some appropriate value (see cao.control). Trying many initial values becomes progressively more important as the nonlinear degrees of freedom of the smooths increase.

Note

CAO models are computationally expensive, therefore setting trace = TRUE is a good idea, as well as running it on a simple random sample of the data set instead.

Sometimes the IRLS algorithm does not converge within the FORTRAN code. This results in warnings being issued. In particular, if an error code of 3 is issued, then this indicates the IRLS algorithm has not converged. One possible remedy is to increase or decrease the nonlinear degrees of freedom so that the curves become more or less flexible, respectively.

Author(s)

T. W. Yee

References

Yee, T. W. (2006). Constrained additive ordination. Ecology, 87, 203–213.

Examples

## Not run: 
hspider[, 1:6] <- scale(hspider[, 1:6])  # Stdzd environmental vars
set.seed(149)  # For reproducible results
ap1 <- cao(cbind(Pardlugu, Pardmont, Pardnigr, Pardpull) ~
           WaterCon + BareSand + FallTwig + CoveMoss + CoveHerb + ReflLux,
           family = poissonff, data = hspider, Rank = 1,
           df1.nl = c(Pardpull= 2.7, 2.5),
           Bestof = 7, Crow1positive = FALSE)
sort(deviance(ap1, history = TRUE))  # A history of all the iterations

Coef(ap1)
concoef(ap1)

par(mfrow = c(2, 2))
plot(ap1)  # All the curves are unimodal; some quite symmetric

par(mfrow = c(1, 1), las = 1)
index <- 1:ncol(depvar(ap1))
lvplot(ap1, lcol = index, pcol = index, y = TRUE)

trplot(ap1, label = TRUE, col = index)
abline(a = 0, b = 1, lty = 2)

trplot(ap1, label = TRUE, col = "blue", log = "xy", which.sp = c(1, 3))
abline(a = 0, b = 1, lty = 2)

persp(ap1, col = index, lwd = 2, label = TRUE)
abline(v = Opt(ap1), lty = 2, col = index)
abline(h = Max(ap1), lty = 2, col = index)

## End(Not run)

VGAM documentation built on Dec. 4, 2025, 1:07 a.m.