# capscale: [Partial] Constrained Analysis of Principal Coordinates or... In vegan: Community Ecology Package

## Description

Constrained Analysis of Principal Coordinates (CAP) is an ordination method similar to Redundancy Analysis (`rda`), but it allows non-Euclidean dissimilarity indices, such as Manhattan or Bray–Curtis distance. Despite this non-Euclidean feature, the analysis is strictly linear and metric. If called with Euclidean distance, the results are identical to `rda`, but `capscale` will be much more inefficient. Function `capscale` is a constrained version of metric scaling, a.k.a. principal coordinates analysis, which is based on the Euclidean distance but can be used, and is more useful, with other dissimilarity measures. The function can also perform unconstrained principal coordinates analysis, optionally using extended dissimilarities.

## Usage

 ```1 2 3``` ```capscale(formula, data, distance = "euclidean", sqrt.dist = FALSE, comm = NULL, add = FALSE, dfun = vegdist, metaMDSdist = FALSE, na.action = na.fail, subset = NULL, ...) ```

## Arguments

 `formula` Model formula. The function can be called only with the formula interface. Most usual features of `formula` hold, especially as defined in `cca` and `rda`. The LHS must be either a community data matrix or a dissimilarity matrix, e.g., from `vegdist` or `dist`. If the LHS is a data matrix, function `vegdist` will be used to find the dissimilarities. The RHS defines the constraints. The constraints can be continuous variables or factors, they can be transformed within the formula, and they can have interactions as in a typical `formula`. The RHS can have a special term `Condition` that defines variables to be “partialled out” before constraints, just like in `rda` or `cca`. This allows the use of partial CAP. `data` Data frame containing the variables on the right hand side of the model formula. `distance` The name of the dissimilarity (or distance) index if the LHS of the `formula` is a data frame instead of dissimilarity matrix. `sqrt.dist` Take square roots of dissimilarities. See section `Notes` below. `comm` Community data frame which will be used for finding species scores when the LHS of the `formula` was a dissimilarity matrix. This is not used if the LHS is a data frame. If this is not supplied, the “species scores” are the axes of initial metric scaling (`cmdscale`) and may be confusing. `add` Logical indicating if an additive constant should be computed, and added to the non-diagonal dissimilarities such that all eigenvalues are non-negative in the underlying Principal Co-ordinates Analysis (see `cmdscale` for details). This implements “correction method 2” of Legendre & Legendre (2012, p. 503). The negative eigenvalues are caused by using semi-metric or non-metric dissimilarities with basically metric `cmdscale`. They are harmless and ignored in `capscale`, but you also can avoid warnings with this option. `dfun` Distance or dissimilarity function used. Any function returning standard `"dist"` and taking the index name as the first argument can be used. `metaMDSdist` Use `metaMDSdist` similarly as in `metaMDS`. This means automatic data transformation and using extended flexible shortest path dissimilarities (function `stepacross`) when there are many dissimilarities based on no shared species. `na.action` Handling of missing values in constraints or conditions. The default (`na.fail`) is to stop with missing values. Choices `na.omit` and `na.exclude` delete rows with missing values, but differ in representation of results. With `na.omit` only non-missing site scores are shown, but `na.exclude` gives `NA` for scores of missing observations. Unlike in `rda`, no WA scores are available for missing constraints or conditions. `subset` Subset of data rows. This can be a logical vector which is `TRUE` for kept observations, or a logical expression which can contain variables in the working environment, `data` or species names of the community data (if given in the formula or as `comm` argument). `...` Other parameters passed to `rda` or to `metaMDSdist`.

## Details

Canonical Analysis of Principal Coordinates (CAP) is simply a Redundancy Analysis of results of Metric (Classical) Multidimensional Scaling (Anderson & Willis 2003). Function capscale uses two steps: (1) it ordinates the dissimilarity matrix using `cmdscale` and (2) analyses these results using `rda`. If the user supplied a community data frame instead of dissimilarities, the function will find the needed dissimilarity matrix using `vegdist` with specified `distance`. However, the method will accept dissimilarity matrices from `vegdist`, `dist`, or any other method producing similar matrices. The constraining variables can be continuous or factors or both, they can have interaction terms, or they can be transformed in the call. Moreover, there can be a special term `Condition` just like in `rda` and `cca` so that “partial” CAP can be performed.

The current implementation differs from the method suggested by Anderson & Willis (2003) in three major points which actually make it similar to distance-based redundancy analysis (Legendre & Anderson 1999):

1. Anderson & Willis used the orthonormal solution of `cmdscale`, whereas `capscale` uses axes weighted by corresponding eigenvalues, so that the ordination distances are the best approximations of original dissimilarities. In the original method, later “noise” axes are just as important as first major axes.

2. Anderson & Willis take only a subset of axes, whereas `capscale` uses all axes with positive eigenvalues. The use of subset is necessary with orthonormal axes to chop off some “noise”, but the use of all axes guarantees that the results are the best approximation of original dissimilarities.

3. Function `capscale` adds species scores as weighted sums of (residual) community matrix (if the matrix is available), whereas Anderson & Willis have no fixed method for adding species scores.

With these definitions, function `capscale` with Euclidean distances will be identical to `rda` in eigenvalues and in site, species and biplot scores (except for possible sign reversal). However, it makes no sense to use `capscale` with Euclidean distances, since direct use of `rda` is much more efficient. Even with non-Euclidean dissimilarities, the rest of the analysis will be metric and linear.

The function can be also used to perform ordinary metric scaling a.k.a. principal coordinates analysis by using a formula with only a constant on the left hand side, or `comm ~ 1`. With `metaMDSdist = TRUE`, the function can do automatic data standardization and use extended dissimilarities using function `stepacross` similarly as in non-metric multidimensional scaling with `metaMDS`.

## Value

The function returns an object of class `capscale` which is identical to the result of `rda`. At the moment, `capscale` does not have specific methods, but it uses `cca` and `rda` methods `plot.cca`, `scores.rda` etc. Moreover, you can use `anova.cca` for permutation tests of “significance” of the results.

## Note

The function produces negative eigenvalues with non-Euclidean dissimilarity indices. The non-Euclidean component of inertia is given under the title `Imaginary` in the printed output. The `Total` inertia is the sum of all eigenvalues, but the sum of all non-negative eigenvalues is given as `Real Total` (which is higher than the `Total`). The ordination is based only on the real dimensions with positive eigenvalues, and therefore the proportions of inertia components only apply to the ```Real Total``` and ignore the `Imaginary` component. Permutation tests with `anova.cca` use only the real solution of positive eigenvalues. Function `adonis` gives similar significance tests, but it also handles the imaginary dimensions (negative eigenvalues) and therefore its results may differ from permutation test results of `capscale`.

If the negative eigenvalues are disturbing, you can use argument `add = TRUE` passed to `cmdscale`, or, preferably, a distance measure that does not cause these warnings. Alternatively, after square root transformation of distances (argument `sqrt.dist = TRUE`) many indices do not produce negative eigenvalues.

The inertia is named after the dissimilarity index as defined in the dissimilarity data, or as `unknown distance` if such an information is missing. Function `rda` usually divides the ordination scores by number of sites minus one. In this way, the inertia is variance instead of sum of squares, and the eigenvalues sum up to variance. Many dissimilarity measures are in the range 0 to 1, so they have already made a similar division. If the largest original dissimilarity is less than or equal to 4 (allowing for `stepacross`), this division is undone in `capscale` and original dissimilarities are used. Keyword `mean` is added to the inertia in cases where division was made, e.g. in Euclidean and Manhattan distances. Inertia is based on squared index, and keyword `squared` is added to the name of distance, unless data were square root transformed (argument `sqrt.dist = TRUE`). If an additive constant was used, keyword `euclidified` is added to the the name of inertia, and the value of the constant is printed (argument `add = TRUE`).

Jari Oksanen

## References

Anderson, M.J. & Willis, T.J. (2003). Canonical analysis of principal coordinates: a useful method of constrained ordination for ecology. Ecology 84, 511–525.

Gower, J.C. (1985). Properties of Euclidean and non-Euclidean distance matrices. Linear Algebra and its Applications 67, 81–97.

Legendre, P. & Anderson, M. J. (1999). Distance-based redundancy analysis: testing multispecies responses in multifactorial ecological experiments. Ecological Monographs 69, 1–24.

Legendre, P. & Legendre, L. (2012). Numerical Ecology. 3rd English Edition. Elsevier

`rda`, `cca`, `plot.cca`, `anova.cca`, `vegdist`, `dist`, `cmdscale`.

The function returns similar result object as `rda` (see `cca.object`). This section for `rda` gives a more complete list of functions that can be used to access and analyse `capscale` results.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16``` ```data(varespec) data(varechem) ## Basic Analysis vare.cap <- capscale(varespec ~ N + P + K + Condition(Al), varechem, dist="bray") vare.cap plot(vare.cap) anova(vare.cap) ## Avoid negative eigenvalues with additive constant capscale(varespec ~ N + P + K + Condition(Al), varechem, dist="bray", add =TRUE) ## Avoid negative eigenvalues by taking square roots of dissimilarities capscale(varespec ~ N + P + K + Condition(Al), varechem, dist = "bray", sqrt.dist= TRUE) ## Principal coordinates analysis with extended dissimilarities capscale(varespec ~ 1, dist="bray", metaMDS = TRUE) ```

### Example output ```Loading required package: permute
This is vegan 2.4-3
Call: capscale(formula = varespec ~ N + P + K + Condition(Al), data =
varechem, distance = "bray")

Inertia Proportion Eigenvals Rank
Total          4.5444     1.0000    4.8034
Conditional    0.9726     0.2140    0.9772    1
Constrained    0.9731     0.2141    0.9972    3
Unconstrained  2.5987     0.5718    2.8290   15
Imaginary                          -0.2590    8
Inertia is squared Bray distance

Eigenvalues for constrained axes:
CAP1   CAP2   CAP3
0.5413 0.3265 0.1293

Eigenvalues for unconstrained axes:
MDS1   MDS2   MDS3   MDS4   MDS5   MDS6   MDS7   MDS8   MDS9  MDS10  MDS11
0.9065 0.5127 0.3379 0.2626 0.2032 0.1618 0.1242 0.0856 0.0689 0.0583 0.0501
MDS12  MDS13  MDS14  MDS15
0.0277 0.0208 0.0073 0.0013

Permutation test for capscale under reduced model
Permutation: free
Number of permutations: 999

Model: capscale(formula = varespec ~ N + P + K + Condition(Al), data = varechem, distance = "bray")
Df SumOfSqs      F Pr(>F)
Model     3  0.97314 2.3717  0.005 **
Residual 19  2.59866
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call: capscale(formula = varespec ~ N + P + K + Condition(Al), data =
varechem, distance = "bray", add = TRUE)

Inertia Proportion Rank
Total          6.2496     1.0000
Conditional    1.0468     0.1675    1
Constrained    1.1956     0.1913    3
Unconstrained  4.0073     0.6412   19
Inertia is Lingoes adjusted squared Bray distance

Eigenvalues for constrained axes:
CAP1   CAP2   CAP3
0.6103 0.3940 0.1913

Eigenvalues for unconstrained axes:
MDS1   MDS2   MDS3   MDS4   MDS5   MDS6   MDS7   MDS8
0.9796 0.5811 0.4077 0.3322 0.2769 0.2346 0.1962 0.1566
(Showed only 8 of all 19 unconstrained eigenvalues)

Constant added to distances: 0.07413903

Call: capscale(formula = varespec ~ N + P + K + Condition(Al), data =
varechem, distance = "bray", sqrt.dist = TRUE)

Inertia Proportion Rank
Total          6.9500     1.0000
Conditional    0.9535     0.1372    1
Constrained    1.2267     0.1765    3
Unconstrained  4.7698     0.6863   19
Inertia is Bray distance

Eigenvalues for constrained axes:
CAP1   CAP2   CAP3
0.5817 0.4086 0.2365

Eigenvalues for unconstrained axes:
MDS1   MDS2   MDS3   MDS4   MDS5   MDS6   MDS7   MDS8
0.9680 0.6100 0.4469 0.3837 0.3371 0.3012 0.2558 0.2010
(Showed only 8 of all 19 unconstrained eigenvalues)

Square root transformation
Wisconsin double standardization
Call: capscale(formula = varespec ~ 1, distance = "bray", metaMDSdist =
TRUE)

Inertia Eigenvals Rank
Total          2.54753   2.59500
Unconstrained  2.54753   2.59500   19
Imaginary               -0.04747    4
Inertia is squared Bray distance

Eigenvalues for unconstrained axes:
MDS1   MDS2   MDS3   MDS4   MDS5   MDS6   MDS7   MDS8
0.6075 0.3820 0.3335 0.2046 0.1731 0.1684 0.1505 0.1163
(Showed only 8 of all 19 unconstrained eigenvalues)

metaMDSdist transformed data: wisconsin(sqrt(varespec))
```

vegan documentation built on May 2, 2019, 5:51 p.m.