In phytomosaic/esa2021: ESA 2021 Workshop on Community Analysis in R

require(learnr)
knitr::opts_chunk$set(echo = TRUE, eval = TRUE, include = TRUE, out.width="99%")
knitr::knit_theme$set("rootwater")
learnr::tutorial_options(exercise.timelimit = 10, exercise.eval = FALSE, exercise.completion = TRUE)

### HIDDEN ###
require(vegan)
require(labdsv)
require(ade4)
require(ecodist)
require(fso)
require(vegclust)
require(ape)
require(picante)
require(mgcv)
load('veg.rda')  # assumed to be in your working dir
xy  <- veg$xy    # spatial
spe <- veg$spe   # species
env <- veg$env   # environment
tra <- veg$tra   # traits
phy <- veg$phy   # phylogeny
# rm(veg)          # cleanup
spe <- data.frame(log10(spe + 1))
env <- data.frame(decostand(scale(env, center=F), 'range'))
tra <- data.frame(decostand(tra, 'range'))
d   <- vegdist(spe, method='bray', binary=T)
D   <- stepacross(d, 'shortest', toolong=1, trace=F)
E   <- dist(xy)                         # spatial distance matrix
pc  <- pcnm(E, w=rowSums(spe)/sum(spe)) # PCNMs *weighted*  by abundances
cl  <- vegclustdist(D, mobileMemb=7, method='FCM', m=1.2, nstart=5)
grp <- defuzzify(cl, 'max')[[2]]
m1  <- metaMDS(D, k=2, try=200, trymax=500, trace=0)  # NMS
m2  <- cmdscale(D, k=2, add=T)                        # PCoA
m3  <- prcomp(spe)                                    # PCA
grp_k <- cut(env$k, quantile(env$k, probs=seq(0,1,by=0.25)), include.lowest=T,
               labels=c('lo','med','hi','veryhi'))
`get_palette` <- function() {
  pal <- c('#414487E6','#404688E6','#3F4889E6','#3E4989E6','#3E4C8AE6',
           '#3D4E8AE6','#3C508BE6','#3B528BE6','#3A548CE6','#39558CE6',
           '#38588CE6','#375A8CE6','#365C8DE6','#355E8DE6','#35608DE6',
           '#34618DE6','#33638DE6','#32658EE6','#31678EE6','#30698EE6',
           '#306A8EE6','#2F6C8EE6','#2E6E8EE6','#2D708EE6','#2C718EE6',
           '#2C738EE6','#2B748EE6','#2A768EE6','#2A788EE6','#297A8EE6',
           '#287C8EE6','#287D8EE6','#277F8EE6','#26818EE6','#26828EE6',
           '#25848EE6','#24868EE6','#24878EE6','#23898EE6','#228B8DE6',
           '#228D8DE6','#218F8DE6','#21908CE6','#20928CE6','#20938CE6',
           '#1F958BE6','#1F978BE6','#1F998AE6','#1F9A8AE6','#1E9C89E6',
           '#1F9E89E6','#1FA088E6','#1FA187E6','#20A386E6','#20A486E6',
           '#21A685E6','#22A884E6','#24AA83E6','#25AC82E6','#26AD81E6',
           '#28AE80E6','#2AB07FE6','#2DB27DE6','#2FB47CE6','#32B67AE6',
           '#34B679E6','#37B878E6','#3ABA76E6','#3DBC74E6','#40BD72E6',
           '#43BF71E6','#47C06FE6','#4AC16DE6','#4EC36BE6','#52C569E6',
           '#55C668E6','#59C864E6','#5DC863E6','#60CA60E6','#65CB5EE6',
           '#68CD5BE6','#6DCD59E6','#71CF57E6','#75D054E6','#7AD151E6',
           '#7FD34EE6','#83D44CE6','#87D549E6','#8CD646E6','#90D743E6',
           '#95D840E6','#9AD93CE6','#9FDA3AE6','#A3DA37E6','#A8DB34E6',
           '#ADDC30E6','#B2DD2DE6','#B7DE2AE6','#BBDF27E6')
  return(pal)
}
`colvec` <- function(x) {
  pal <- get_palette()
  return(pal[cut(as.numeric(x), breaks=length(pal), include.lowest=TRUE)])
}
`makecwm` <- function (spe, tra) {
  spe <- as.matrix(spe)
  tra <- as.matrix(tra)
  `standardize` <- function(x) {
    (x - min(x, na.rm=TRUE)) / diff(range(x, na.rm=TRUE))
  }
  tra <- apply(tra, MARGIN = 2, FUN = standardize)
  awt <- spe %*% tra
  awt / rowSums(spe, na.rm=TRUE)
}
cwm <- data.frame(makecwm(spe, tra)) # make the CWM traits matrix
### HIDDEN ###

Overview

Why are we here today? {#why-here}

This course demonstrates the practice of ecological community analysis in R. Our goal is to link biological reality with corresponding conceptual and mathematical models. We borrow and apply concepts from ecology, statistics and computer science, but do not get too bogged down in theoretical details. This short course aims to be an axe handle{target="_blank"} --- carve and shape it to your own practice.

Learning outcomes

We will:

Analyze, visualize and interpret ecological community data in R software.
Select, modify or create new multivariate methods for ecological community analysis in R software.
Gain proficiency with typical R software usage through direct experience.

Assumptions

We assume you have a basic familiarity with R. If not, please peruse here{target="_blank"}. At the least, you should be able to assign an object, distinguish between different objects (vector, matrix, data.frame, etc.), and index rows/columns of a matrix or data.frame.

We assume you have familiarity with basic statistics (population vs sample; mean, SD, SE, CI; simple linear models), and have at least dabbled in linear algebra (matrices; linear combinations), and perhaps become suspicious that calculus may indeed exist. If not, you could brush up on matrices{target="_blank"}.

Software installation

Packages contain supplemental functions you will need. They are supplemental beyond "base" R. Ordinarily you would open R and install required packages as follows (but these dependencies should have automatically installed when you first installed this tutorial).

install.packages('vegan')     # community ecology
install.packages('labdsv')    # community ecology
install.packages('ade4')      # community ecology
install.packages('ecodist')   # community ecology
install.packages('fso')       # ordination
install.packages('vegclust')  # clustering
install.packages('ape')       # phylogenetics
install.packages('picante')   # phylogenetics
install.packages('mgcv')      # nonlinear regression

Load packages

To use functions provided by packages, load them into your R environment. Again, this has already been done for you in the tutorial.

require(vegan)
require(labdsv)
require(ade4)
require(ecodist)
require(fso)
require(vegclust)
require(ape)
require(picante)
require(mgcv)

Useful functions to explore

Base R already contains lots of useful functions. This list has a few basic functions:

ls()
dim(spe)
NROW(spe)
NCOL(spe)
str(spe)
head(spe)
names(spe)
x <- spe[,11]
x <- spe[,'bolboschoenus_maritimus']
hist(x)
plot(x)
rm(x)

From the list above, copy-and-paste some commands into the code box below, and help describe a dataset called spe. What is inside that object? What size is it?

### run code here

Define objects and functions

We've already said that you can use functions from base R or from other packages. But you can also "roll your own"! Functions take the generic form f <- function(x) { x }. For example:

# function to pick an item from the vector by position
`item_picker` <- function(x, pos = NULL) {
  x[pos]
}
z <- 1:19               # assign a sequence of numbers to object `z`
item_picker(z, pos = 5) # pick the fifth element of `z`

Data

Where did this dataset originate? {#dataorigin}

Mafragh, Algeria vegetation data

We will use one core dataset throughout this tutorial. The veg dataset gives information about spatial coordinates, species, environment, traits, and phylogeny for plants on the Mafragh coastal plain in North Africa. Specifically, veg is a list containing five items:

xy 97 observations of 2 spatial coordinates
spe 97 observations of 56 plant species
env 97 observations of 11 soil environmental variables
tra 12 traits for 56 plant species
phy phylogeny for 56 plant species

Load data

A common task in R is to read from CSV files, e.g., x <- read.csv('path/to/file.csv'), but here we will load data that's already in R format.

# load('./veg.rda')  # assumed to be in your working dir
xy  <- veg$xy    # spatial
spe <- veg$spe   # species
env <- veg$env   # environment
tra <- veg$tra   # traits
phy <- veg$phy   # phylogeny
rm(veg)          # cleanup
ls()             # objects now in this local environment

Study area description

The Mafragh plain is in El Tarf Province of far northeast Algeria (36.84704$^{\circ}$, 7.945$^{\circ}$). According to documentation in R package ade4:

This marshy coastal plain is a geomorphological feature, east of Port of Annaba, limited to the north by the Mediterranean Sea and a dune cordon, to the south by clay-sandstone Numidian massifs, to the west by a wadi and to East by an irrigated perimeter. It covers 15,000 ha of which 10,000 form the area of study.

Potential human pressures include grazing, irrigation and salination, plowing, and housing development. Its Köppen climate type is "Csa: Hot-summer Mediterranean climate".

History of the Mafragh dataset

The data originate from the field studies of Prof. Gérard de Bélair of Annaba in northeast Algeria. Born in France, de Bélair became a missionary priest in 1969 shortly after Algeria's independence, then an agronomist during the Agrarian Revolution, then a botanist, then professor, then conservationist (according to his lay and clergy accounts). As of 2021, he continues to actively publish on the vegetation of northern Algeria.

We should recognize the colonialist legacies that brought us here. France controlled Algeria from 1830 until its independence in 1962. Prior to that, Algeria was occupied by Ottoman Turks, Spaniards, Arab Muslims, Berber/Amazigh, Romans, Phoenicians, and prehistoric occupants dating to at least 1.8 million years ago. We may speculate that the region's vegetation and cultural histories have been closely intertwined.

The data were included in early versions of the ade4 software (Dray and Dufour 2007) which itself has its roots at Université Lyon 1, France. The present data are lightly modified from Appendix S4 of Pavoine et al. (2011), who also added a phylogeny.

Key references

de Bélair, G. 1981. Biogéographie et aménagement de la Plaine de la Mafragh (Annaba - Algérie) [Biogeography and development of the Mafragh Plain (Annaba, Algeria)]. Dissertation, Université Paul Valéry, Montpellier. http://www.sudoc.fr/00871262X

de Bélair, G. 1990. Structure, fonctionnement et perspectives de gestion de quatre écocomplexes lacustres et marécageux (El Kala, Est-Algérien) [Structure, function and management perspectives of four lake and marsh eco-complexes (El Kala, East Algeria)]. Dissertation, Université des sciences et techniques de Montpellier 2. http://www.sudoc.fr/007729197

de Bélair, G. and M. Bencheikh-Lehocine. 1987. Composition et déterminisme de la végétation d'une plaine côtière marécageuse: La Mafragh (Annaba, Algérie) [Composition and determinism of the vegetation of a marshy coastal plain: La Mafragh (Annaba, Algeria)]. Bulletin d'Ecologie, 18(4): 393–407.

Dray, S. and A. Dufour. 2007. The ade4 package: implementing the duality diagram for ecologists. Journal of Statistical Software 22(4): 1-20. doi:10.18637/jss.v022.i04

Pavoine, S., Vela, E., Gachet, S., de Bélair, G. and Bonsall, M. B. 2011. Linking patterns in phylogeny, traits, abiotic variables and space: a novel approach to linking environmental filtering and plant community assembly. Journal of Ecology 99: 165–175. doi:10.1111/j.1365-2745.2010.01743.x

Pre-analysis

How do I get my data into shape for analysis? {#preanalysis-shape}

Check data structure

A first step in exploring ecological data is to examine its structure. Copy-and-paste some of this code into the box below, and run one at a time:

str(spe)     # community (species) abundance matrix
str(env)     # environmental matrix
str(tra)     # traits matrix
str(phy, 1)  # phylogeny
str(xy)      # spatial coordinates

### run code here

Find and replace troublesome values

### create example matrix with some pesky values
s <- spe
s[5,2] <- NA
s[6,4] <- (-999)

### index the row and column of any NA values
which(is.na(s), arr.ind=TRUE)

### index the row and column of any negative values
which(s < 0, arr.ind=TRUE)

### replace value by logical test
s[is.na(s)] <- 777
head(s)

Data transformations

### load data
spe <- veg$spe   # species
env <- veg$env   # environment
tra <- veg$tra   # traits

### basic transformations
spet <- data.frame(log10(spe + 1))
envt <- data.frame(vegan::decostand(scale(env, center=F), 'range'))
trat <- data.frame(vegan::decostand(tra, 'range'))

### compare a few
par(mfrow=c(1,2))
plot(tra$bfp, trat$bfp)
plot(env$k2o, envt$k2o)

Abundances to presence/absence

Sometimes we simply need to convert abundance values (counts or percentages) to binary presence/absence values (0 or 1).

s <- (spe > 0) * 1  # from numeric to 0/1
s[1:5, 1:7]         # peek at a few
range(s)            # is this what you expect?

Outliers

A quick way to search for multivariate outliers is to use the distance/dissimilarity matrix. A multivariate outlier has unusual values for a combination of multiple variables (McCune and Grace 2002). We can identify them as the sample unit(s) which exceed a defined number of standard deviations away from the grand mean (i.e., a large z-score).

### define outlier function
`outliers` <- function (x, mult=2, method='bray') {
  d <- as.matrix(vegan::vegdist(x, method=method, binary=F, diag=T, upper=T))
  diag(d) <- as.numeric(1)     # avoid zero-multiplication
  m       <- apply(d, 2, mean) # site means
  z       <- scale(m)          # z-scores
  data.frame(mean_dist = m, z = z, is_outlier = abs(z) >= mult)
}

### try it
o <- outliers(spe, mult=2)
head(o, 7)
which(o$is_outlier) 
### Thinker: how would you find the z-score of these outliers?

Test validity of species matrix

Before proceeding with most multivariate analyses, our species abundance matrix must have 1) no missing values, 2) every site must have at least one species observation, and 3) every species must occur in at least one site.

!anyNA(spe)                     # expect TRUE, no missing values
all(rowSums(spe, na.rm=T) != 0) # expect TRUE, no empty sites
all(colSums(spe, na.rm=T) != 0) # expect TRUE, no empty species

Visualize data

Let's map the spatial coordinates.

# spatial
plot(xy, pch=19, col='grey', xlab='Eastness', ylab='Northness')

Plot the species abundance matrix as a heatmap.

# species
vegan::tabasco(spe, col=get_palette())

Plot the soils (environmental) matrix as a heatmap.

# environment
vegan::tabasco(env, col=get_palette())

Plot the traits matrix as a heatmap.

# traits
vegan::tabasco(tra, col=get_palette())

Plot the phylogenetic tree.

# phylogeny
plot(phy, cex=0.6, no.margin=TRUE)

Key references

McCune, B., and J.B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon, USA.

Diversity measures

How diverse is this site? many sites? region? {#diversity-main}

Let's examine the "diversity" of our sites and study area. Diversity can encompass any sort of variation, e.g., taxonomic, trait, or phylogenetic. Here we assume that spe is a species abundance matrix where rows = sites, columns = taxa, and values must not be negative or missing.

Gamma (regional) diversity

gamma <- sum(colSums(spe) > 0)
gamma

Alpha (per-site) diversity

alpha <- rowSums(spe > 0)
alpha    # within-site
avgalpha <- mean(rowSums(spe > 0))
avgalpha # average within-site

Beta (among-site) diversity: Whittaker's

gamma    <- sum(colSums(spe) > 0)
avgalpha <- mean(rowSums(spe > 0))
beta     <- gamma / avgalpha - 1
beta

Beta diversity: dust bunny indices

McCune and Root (2015) proposed two model-free measures of how strongly multivariate species data exhibit a "dust bunny" distribution, that is, departure from multivariate normality because of many zero abundances due to few species occurring in any given sample unit. Quantifying this is important because multivariate normality is a core assumption of many analytical tools (PCA, CCA, linear regression models).

### 1 -- proportion of zeros in the matrix
###   (independent of abundance)
eps      <- .Machine$double.eps # machine tolerance
propzero <- sum(spe < eps) / prod(dim(spe))
cat('Proportion of zeros in matrix:', propzero, '\n')

### 2 -- "dust bunny index" of McCune and Root (2015)
###   (integrates abundances)
dbi <- 1 - mean(as.matrix(vegan::decostand(spe, method='max')))
cat('Dust bunny index:', dbi, '\n')

Beta-diversity: no-share sites

High species turnover (beta-diversity) can cause many site-pairs to share no species in common. What proportion of all pairs of sites have totally different sets of species?

### how many site-pairs share no species in common?
z <- vegan::no.shared(spe)
propnoshare <- sum(z) / length(z)
cat('Proportion of no-share sites:', propnoshare, '\n')

Key references

McCune, B., and H.T. Root. 2015. Origin of the dust bunny distribution in ecological community data. Plant Ecology 216(5): 645-656.

Dissimilarities

How dissimilar are community compositions among sites? {#dissim-main}

Dissimilarities among sites help us position sites along gradients, organize them into clusters, or distinguish differences among groups of sites.

Dissimilarity matrix for species

Create a pairwise dissimilarity matrix D based on shared and unshared species among sites. Euclidean distances are seldom useful for species abundance data, here we use the more preferable Bray-Curtis (Sørensen) dissimilarity. Legendre and Legendre (2012) give a full survey of existing distance/dissimilarity measures.

d <- vegdist(spe, method='bray', binary=T)
str(d)
tabasco(as.matrix(d), col=get_palette()) 
### Thinker: why is the diagonal 'blank'?

Stepacross adjustment

Sometimes a pair of sites are "maximally dissimilar": they share no species at all, so pairwise dissimilarity is impossible to quantify using a proportional measure (e.g., Bray-Curtis). Stepacross adjustment solves this by linking "no-share" sites with intermediate site(s) that share at least some species in common. Smith (2017) describes other possible solutions.

D <- stepacross(d, 'shortest', toolong = 1)
plot(d, D, xlab = 'Original', ylab = 'Stepacross')
### Thinker: why the vertical stripe near 1.0?

Distance matrix for environment

Create a pairwise distance matrix E based on Euclidean environmental distances among sites.

E <- vegdist(env, method='bray', binary=T)
str(E)
tabasco(as.matrix(E), col=get_palette())

Key references

Legendre, P., and L. Legendre. 2012. Numerical Ecology. Volume 24, Third Edition. Elsevier.

Smith, R.J. 2017. Solutions for loss of information in high-beta-diversity community data. Methods in Ecology and Evolution 8:68–74.

Ordination (unconstrained)

How best to order sites by dissimilarities of community compositions? {#ord-unconstrained}

Order sites according to community compositions. Ordination distances indicate compositional dissimilarities. Nearby points are more compositionally similar than distant points.

# three kinds of ordination
m1 <- metaMDS(D, k=2, try=50, trymax=51, trace=0)  # NMS
m2 <- cmdscale(D, k=2, add=T)               # PCoA
m3 <- prcomp(spe)                           # PCA

For ecological community data, NMS (Kruskal 1964) is almost always the best choice (Minchin 1987).

Visualizing ordinations

# color vector for plotting
u <- get_palette()
u <- u[1:nrow(spe)]

# compare three kinds of ordination
par(mfrow=c(1,3), bty='l', las=1)
plot(m1$points, type='n', xlab='NMDS1', ylab='NMDS2')
text(m1$points, rownames(m1$points),cex=.8, col=u)
plot(m2$points, type='n', xlab='PCoA1', ylab='PCoA2')
text(m2$points, rownames(m2$points),cex=.8, col=u)
plot(m3$x, type='n', xlab='PCA1', ylab='PCA2')
text(m3$x, rownames(m3$x),cex=.8, col=u)
par(mfrow=c(1,1))

Plotting again, now overlay environmental variables.

### overlay gradients on the NMS using point colors
par(mfrow=c(1,3), bty='l', las=1)
plot(scores(m1), pch=16, col=colvec(env$k))
plot(scores(m1), pch=16, col=colvec(env$k2o))
plot(scores(m1), pch=16, col=colvec(env$sand))
par(mfrow=c(1,1))

### overlay gradients on the NMS by fitting a GAM surface
f1 <- ordisurf(m1 ~ env$k, plot = FALSE)
plot(scores(m1), pch = 16, col = colvec(env$k))
plot(f1, add = TRUE, col = 1, lwd = 2)

Ordination goodness-of-fit

Ordination methods have intrinsic measures of fit (e.g., stress for NMS). However, for all methods, a "good" ordination should have a decent correlation between the Euclidean inter-point ordination distances versus the original species dissimilarities used to construct the ordination.

cor(dist(m1$points), D, method='kendall') # NMS almost always best
cor(dist(m2$points), D, method='kendall')
cor(dist(m3$x),      D, method='kendall')
### Thinker: why is this unfair to PCA?

We can also use Procrustes analysis (Peres-Neto and Jackson 2001) to examine the fit of a species ordination versus an environmental ordination.

pc <- prcomp(env[,sapply(env, is.numeric)], scale=T) # environmental PCA
pc <- scores(pc, display='sites', choices=1:2)       # environmental PCA scores
eD <- vegdist(pc, method='euc')  # Euclidean interpoint distances
protest(m1$points, pc)           # fit = correlation in Procrustes rotation
protest(D, eD)                   # fit = correlation in Procrustes rotation

Dimensionality selection in NMS

For NMS, the user must select the number of dimensions a priori. Yet, what is the "best" number of dimensions? We can examine how stress changes as we increment the number of dimensions, and select the dimensionality that balances sufficiently "low" stress with the fewest dimensions.

# define screeplot function, running NMS for varying 'k' dimensions
`scree_nms` <- function(D, k=5, ...) {
  stress <- rep(NA, k)
  for (i in 1:k) {
    cat('calculating', i, 'of', k, 'dimensions...\n')
    stress[i] <- metaMDS(D, k=i, trace=0, ...)$stress
  }
  plot(1:k, stress, main='', xlab='Dimension', ylab='Stress',
       ylim=c(0, max(stress)*1.05), pch=16, las=1, bty='l')
  lines(1:k, stress)
  abline(0.20, 0, col='red', lty = 2)
  data.matrix(stress)
}
scree <- scree_nms(D, k=5, trymax=10)
scree
### Thinker: how many dimensions would you select?

Key references

Kruskal, J.B. 1964. Multidimensional scaling by optimizing goodness-of-fit to a nonmetric hypothesis. Psychometrika 29: 1–28.

Minchin, P.R. 1987. An evaluation of relative robustness of techniques for ecological ordinations. Vegetatio 69: 89–107.

Peres-Neto, P. R., and D. A. Jackson. 2001. How well do multivariate data sets match? The advantages of a Procrustean superimposition approach over the Mantel test. Oecologia 129:169–178.

Ordination (constrained)

How best to order sites by dissimilarities of community compositions, in response to measured environmental variables? {#ord-constrained}

Constrained ordination addresses specific a priori hypotheses about variation in community compositions attributable to measured environmental gradients. It portrays only a fragment of all community variation, and is not used for exploratory analysis.

Two important tools are distance-based RDA (dbRDA, McArdle and Anderson 2001), and fuzzy set ordination (FSO, Roberts 2008), both of which let the user select their own dissimilarity measure. Other methods (redundancy analysis or constrained correspondence analysis) force an implicit dissimilarity measure and make linearity/distributional assumptions.

### RDA: redundancy analysis (assumes linear species responses)
m1 <- rda(spe ~ k + sand + conductivity, data=env)

### CCA: constrained correspondence analysis (assumes unimodal species responses)
m2 <- cca(spe ~ k + sand + conductivity, data=env)

### dbRDA: distance-based RDA
m3 <- dbrda(D ~ k + sand + conductivity, 
            data = env, 
            comm = spe, 
            add  = 'lingoes')

### FSO: fuzzy set ordination
m4 <- fso::fso( ~ k + sand + conductivity, 
                dis     = D, 
                data    = env, 
                permute = F)

### summaries
print(m1)
print(m2)
print(m3)
summary(m4)

### compare four kinds of ordination
u <- colvec(env$k)         # color by soil potassium
par(mfrow=c(2,2), bty='l', las=1)
plot(m1, disp='wa')
points(m1, pch=16, col=u)
plot(m2, disp='wa')
points(m2, pch=16, col=u)
plot(m3, disp='wa')
points(m3, pch=16, col=u)
plot(scores(m4$mu[,1:2]), pch=16, col=u, asp=1,
     xlab='MFSO1, potassium', ylab='MFSO2, sand')

Key references

McArdle, B.H. & Anderson, M.J. 2001. Fitting multivariate models to community data: a comment on distance-based redundancy analysis. Ecology 82: 290-297.

Roberts, D.W. 2008. Statistical analysis of multidimensional fuzzy set ordinations. Ecology 89: 1246-1260.

Group clustering

How to group sites according to dissimilarity of community compositions? {#clustering}

Hierarchical: Ward's clustering

Ward's clustering is a hierarchical cluster analysis method based on community dissimilarities (Murtagh and Legendre 2014).

### perform the clustering
k   <- 7                           # number of groups is specified in advance
cl  <- hclust(D, method='ward.D2') # clustering solution
grp <- cutree(cl, k)               # group memberships

### plot the dendrogram, and plot the groups onto an NMS ordination
u <- colvec(grp)
par(mfrow=c(1,2), bty='l', las=1, oma=c(0,0,0,0), mar=c(3,3,0.5,0.5))
plot(cl, cex=0.5, main='')
rect.hclust(cl, k, border=unique(u))
plot(scores(m1), pch=16, cex=0.7, col=u)
ordicluster(m1, cl, prune=k, lwd=0.5, col=u)

Non-hierarchical: fuzzy clustering

Non-hierarchical methods also exist. And, it may be useful to define groups based on fuzzy rather than crisp partitions, using fuzzy clustering (de Caceres et al. 2010).

ns   <- 1       # number of random starts (increase for real analyses!)
k    <- 7       # number of groups is specified in advance
# fuzzy c-means
cl   <- vegclustdist(D, mobileMemb=k, method='FCM', m=1.2, nstart=ns)
# fuzzy c-means with a noise cluster
cl_a <- vegclustdist(D, mobileMemb=k, method='NC', m=1.2, dnoise=0.8, nstart=ns)
# fuzzy c-medoids
cl_b <- vegclustdist(D, mobileMemb=k, method='FCMdd', m=1.2, nstart=ns)
# crisp k-means
cl_c <- vegclustdist(D, mobileMemb=k, method='KM', nstart=ns)

# how well do the other methods agree with fuzzy c-means?
concordance(cl, cl_a)
concordance(cl, cl_b)
concordance(cl, cl_c)

Let's examine group memberships grp from the fuzzy clustering. While memberships are fuzzy, we can ask for a crisp membership based on a threshold or else the maximum fuzzy membership.

head(round(cl$memb, 2))               # fuzzy membership
(grp <- defuzzify(cl, 'cut', alpha=0.8)[[2]]) # crisp, at threshold (incl NAs)
(grp <- defuzzify(cl, 'max')[[2]])    # crisp membership, at max membership
table(grp, useNA='always')            # tally points per group

Also examine the group characteristics:

### examine group characteristics
cl$size                               # group size (sum of fuzzy memberships)
cl$withinss                           # group sums-of-squares
round(ctr <- clustcentroid(D, grp),3) # group centroids
(medoids  <- clustmedoid(spe, grp))   # *indices* of group medoids
clustvar(D, grp)                      # within-group variance
clustvar(D)                           # pooled among-group variance
as.dist(D_ctr <- as.matrix(interclustdist(D, grp))) # dissim among centroids
as.dist(D_med <- as.matrix(D)[medoids,medoids])     # dissim among medoids

How are species represented per group? Examine species frequency (constancy) per group:

con <- clustconst(spe, memb=as.memb(grp))
tabasco(t(con), col=get_palette())    # constancy per group
summary(con, mode='all')              # constancy per group
summary(con, mode='cluster', 'M1')    # examine one particular group

How to choose a valid number of groups? Systematically vary k, then evaluate. PCN = normalized partition coefficient (higher is better). PEN = normalized partition entropy (lower is better).

### systematically vary k from 3 to 9, then evaluate
cl_k <- random.vegclustdist(D, cmin=3, cmax=9, nstart=3, method='FCM', m=1.2)
sapply(seq_along(cl_k), function(i) vegclustIndex(cl_k[[i]])) # evaluate!

You may wish to add new sites to an existing classification. For example, imagine we fit a model using "western" sites, and now must assign values for the "eastern" sites.

### assign *new* sites to an *existing* classification ---
i    <- (xy$x < 220)                  # split the dataset, based on location
west <- spe[i,]                       # old calibration data = west
west <- west[,colSums(west) > 0]      # exclude zero-sum species
east <- spe[!i,]                      # new evaluation data = east
east <- east[,colSums(east) > 0]      # exclude zero-sum species
k    <- 7                             # number of groups specified in advance
(grp_west <- cutree(hclust(vegdist(west), 'ward.D2'), k)) # *existing* model
cl_west   <- as.vegclust(vegdist(west), grp_west) # convert to vegclust object
m  <- conformveg(west, east)                      # merge datasets
DD <- as.matrix(vegdist(rbind(m$x,m$y)))          # ALL dissimilarities
DD <- DD[(NROW(west)+1):NCOL(DD), 1:NROW(west)]   # rows = new, cols = old
cl_east <- vegclass(cl_west, DD)      # assign new points given existing model
grp_west <- defuzzify(cl_west)[[2]]   # memberships of *old* points
grp_east <- defuzzify(cl_east)[[2]]   # memberships of *new* points
grp <- c(grp_west,grp_east)
par(mfrow=c(1,2), oma=c(0,0,0,0), mar=c(2,2,1,0))
plot(xy, col=i+1, pch=16)             # map east/west locations
plot(xy, col=colvec(grp), pch=16)     # map group memberships
text(xy, grp)                         # identify group membership
legend('bottomleft', leg=1:k, fill=unique(colvec(grp)), title='Group', cex=0.7)

Optimal partitioning

# optimal partitioning
require(optpart)

Cluster evaluation and goodness-of-fit

Key references

de Caceres, M., X. Font, F. Oliva. 2010. The management of numerical vegetation classifications with fuzzy clustering methods. Journal of Vegetation Science 21: 1138–1151.

Murtagh, F., and P. Legendre. 2014. Ward's hierarchical agglomerative clustering method: which algorithms implement Ward's criterion? Journal of Classification 31: 274–295.

Group differences

Do community compositions significantly differ among groups? {#permanova}

Let's assume PERMANOVA is the best choice for comparing groups in multivariate space. Other methods (ANOSIM, MRPP) exist, but PERMANOVA usually has better power and false-detection rates, even when groups may have different dispersions (Anderson and Walsh 2013) among other operational advantages.

### define and visualize four groups
brk     <- quantile(env$k, probs=seq(0,1,by=0.25))   # define breaks
grp_k <- cut(env$k, brk, include.lowest=T,
               labels=c('lo','med','hi','veryhi'))   # group memberships
table(grp_k, useNA='always')                       # group tally
tapply(env$k, INDEX = grp_k, FUN = mean)           # group means
plot(m1$points, pch=NA)                              # visualize on the NMS
text(m1$points, labels=grp_k, col=as.numeric(grp_k)) # group memberships
ordispider(m1, groups=grp_k, col=1:4)              # group centroids

Test for difference in community compositions

PERMANOVA tests for differences in multivariate centroid, in the space of the chosen dissimilarity measure. Recall that matrix D was our matrix of pairwise Bray-Curtis dissimilarities.

### permanova: test for differences in multivariate *centroid*
a1 <- adonis(D ~ grp_k, permu=999)
a1

How do we interpret this result? Based on Bray-Curtis dissimilarities, four groups defined by soil potassium significantly differed in community compositions (PERMANOVA pseudo-F = 10.2, permutational p = 0.001, R^2^ = 0.25). The NMS ordination helps us see that this is largely driven by the "veryhi" group. Which you can test formally by repeating PERMANOVA for pairwise multiple comparisons.

Test for homogeneity of community compositions

A related test, PERMDISP, tests for differences in multivariate dispersion, in the space of the chosen dissimilarity measure. This is interpreted as a test of among-group homogeneity of community compositions, i.e., beta-diversity. Conveniently, we can ask for pairwise multiple comparisons.

### permdisp: test for differences in multivariate *dispersion*
b1 <- betadisper(D, grp_k)
b1
permutest(b1, pairwise=TRUE, permu=999)
boxplot(b1)

How do we interpret this? Based on Principal Coordinates reduction of original Bray-Curtis dissimilarities, the four groups defined by soil potassium significantly differed in homogeneity of community compositions (PERMDISP F = 7.98, permutational p = 0.001). This was due to the "veryhi" potassium group having significantly lower dispersion (lower beta-diversity) than the lower-potassium groups.

PERMANOVA equivalence to ANOVA

PERMANOVA works as 'regular' ANOVA when using Euclidean distances. We'd expect a different p-value since we're using permutations, but identical F-values and sums-of-squares.

De <- dist(env$k, 'euc')            # Euclidean distances
adonis(De ~ grp_k, perm=999)      # examine F and SS
print(anova(lm(k ~ grp, env)))      # expect identical F and SS

PERMANOVA for blocked design

For blocked designs (e.g., for repeated measures or genuine blocked designs), we must specify how permutations occur. Because units are only exchangeable within blocks, the permutations should occur within blocks.

### blocked design: permutations must occur within strata
blk <- factor(LETTERS[sample(rep(1:12,len=nrow(env)))]) # arbitrary 'blocks'
plot(m1)                                       # visualize on NMS
ordispider(m1, blk, label=TRUE)
customperm <- how(nperm=999)                   # set number of permutations
setBlocks(customperm) <- blk                   # permute only *within* blocks
adonis(D ~ grp_k, permutations=customperm)   # correct test

Key references

Anderson, M. J., and D. C. I. Walsh. 2013. PERMANOVA, ANOSIM, and the Mantel test in the face of heterogeneous dispersions: What null hypothesis are you testing? Ecological Monographs 83:557–574. https://doi.org/10.1890/12-2010.1

Group indicator species

Which species are indicative of pre-defined groups? {#indval}

The Indicator Value (IndVal) is the product of a species' fidelity and specificity to a group (Dufrêne and Legendre 1997).

Here we use labdsv::indval to examine indicators of single groups, but also see R package indicspecies to test whether species might indicate multiple groups.

First, let's use the same groups we used for PERMANOVA: which species are indicative of the four groups defined by soil potassium? We can compare these "real" indicator species against a random expectation.

### real groups
grp <- cut(env$k, quantile(env$k, probs=seq(0,1,by=0.25)), include.lowest=T,
               labels=c('lo','med','hi','veryhi'))   # group memberships
iv  <- labdsv::indval(spe, grp) # indicator species analysis for *real* groups
summary(iv)                     # IndVal observed

### random groups
rnd <- sample(grp, length(grp), replace=T) # define random groups by bootstrapping
ivr <- labdsv::indval(spe, rnd) # indicator species analysis for *random* groups
summary(ivr)                    # IndVal expected at random

How many indicator species would you expect at random?

### null expectation setting alpha = 5%
ceiling( ncol(spe) * 0.05 )

If you defined groups by a clustering method using species information, do not use IndVal to find indicator species of those groups (that would be circular logic).

Key references

Dufrêne, M. and Legendre, P. 1997. Species assemblages and indicator species: the need for a flexible asymmetrical approach. Ecological Monographs 67(3): 345--366.

de Cáceres, M., P. Legendre, and M. Moretti. 2010. Improving indicator species analysis by combining groups of sites. Oikos 119:1674–1684.

Community traits

How do species’ evolutionary relationships explain responses? {#traits-main}

Recall the traits matrix, which assigns each species (rows) a value for each of twelve traits (columns):

names(tra)
head(tra)
hist(tra$lfp, breaks=11)

We can ask several questions about trait diversity, convergence/divergence, and relationships to environment.

Trait dissimilarity

Are traits (for species in a site) convergent or divergent? {#trait-convergent}

The mean pairwise distance (MPD) shows how far (on average) the traits of one community are relative to all others. Most commonly these are Euclidean or Gower distances. A randomization test gives a test statistic called SES, standardized effect size. Significant positive SES indicates traits are less similar than expected at random (i.e., species have divergent traits), while negative SES indicates traits are more similar than expected at random (i.e., species tend to converge on similar traits). We can examine trait divergence/convergence along a gradient, or test for divergence/convergence within groups.

### Euclidean trait distance (traits already scaled 0-1)
Dt  <- dist(tra, method='euc')
### calculate trait SES of mean pairwise distances in sites
ses <- picante::ses.mpd(spe, Dt, null.model='richness', abund=F, runs=999)
### plot SES across the gradient
plot(ses$mpd.obs.z ~ env$k, ylab='Trait SES(MPD)', xlab='Soil potassium', 
     col=ifelse(ses$mpd.obs.p < 0.05, 'red','black'))
abline(lm(ses$mpd.obs.z ~ env$k)) # regression line
abline(h=0, lty=2)                # random-traits line
text(0.9, 1, 'Divergent')      
text(0.9, -4, 'Convergent')

Nonlinear trait-environment correlations

How strongly are traits related to environment? {#trait-env-corr}

To evaluate the strength of relationship between traits and environment, we can perform ordination of sites based on traits (rather than species). We first need to make a community-weighted means (CWM) matrix, a traits-by-site matrix whose values are trait means across the species in each site.

`makecwm` <- function (spe, tra) {
  spe <- as.matrix(spe)
  tra <- as.matrix(tra)
  `standardize` <- function(x) {
    (x - min(x, na.rm=TRUE)) / diff(range(x, na.rm=TRUE))
  }
  tra <- apply(tra, MARGIN = 2, FUN = standardize)
  awt <- spe %*% tra                 # abundance-weighted trait totals
  awt / rowSums(spe, na.rm=TRUE)     # community-weighted traits matrix
}
cwm <- data.frame(makecwm(spe, tra)) # make the CWM traits matrix
tabasco(cwm, col=get_palette())      # visualize

Ordination of the CWM matrix gives a configuration of sites in traits space, where distance between points indicates dissimilarity of trait compositions (trait syndromes). Nonlinear regression of environmental variables in this traits space can give us the trait-environment correlations that we're after.

### NMS ordination of traits
m <- metaMDS(cwm, 'altGower', k=2, trace=0)
plot(m, cex=cwm$lfp) # plot sites in *traits space*

### fit one environmental variable in traits space
o <- ordisurf(m, env$k, col=3, add=T)

### fit *all* environmental variables in traits space
fit <- t(sapply(env, function(i) {
  g <- summary(vegan::ordisurf(m ~ i, plot=F)) # fit GAM
  c(`pval` = as.numeric(sprintf('%.3f',round(g$s.pv,3))),
    `r2`   = as.numeric(sprintf('%.3f',round(g$r.sq,3))))
}))
cat('GAM p-values and goodness-of-fit, sorted\n',
    '----------------------------------------\n')
fit[rev(order(fit[,2])),]

Here, the environmental variables with the strongest nonlinear relationship to trait compositions were potassium, sodium, and elevation.

Fourth-corner analysis

RLQ and fourth-corner analysis can also detect linear traits-environment relationships as mediated by the species abundance matrix. There is no need to calculate the CWM matrix. Because it boils down to a series of PCA operations, the RLQ method does not readily admit nonlinear relationships. It is included here for completeness.

### the RLQ method
o_spe <- dudi.coa(spe, scannf = FALSE, nf = 2)
o_env <- dudi.hillsmith(env, scannf = FALSE, nf = 2, row.w = o_spe$lw)
o_tra <- dudi.hillsmith(tra, scannf = FALSE, nf = 2, row.w = o_spe$cw)
r     <- rlq(o_env, o_spe, o_tra, scannf = FALSE, nf = 2)
plot(r)
summary(r)
randtest(r)
fourthcorner.rlq(r, type='Q.axes')
fourthcorner.rlq(r, type='R.axes')

Phylogenetic correction of traits

How to remove phylogenetic relatedness from traits? {#trait-phylo-correct}

Often we suspect that close phylogenetic relatives may share similar trait values than more distant relatives (i.e., trait conservatism). For further analyses it may be useful to "remove" the effects of relatedness from traits, based on the phylogenetic variance-covariance matrix.

### function to do phylogenetic correction
`phylo_corr` <- function(phy, tra, ...){
  if (class(phy) != 'phylo') stop('phy must be of class `phylo`')
  if (!is.data.frame(tra)) tra <- as.data.frame(tra)
  rn <- dimnames(tra)[[1]] # species names
  cn <- dimnames(tra)[[2]] # traits names
  n  <- dim(tra)[[2]]    # n of traits
  if (!identical(phy$tip.label, rn)) stop('species name mismatch')
  G  <- ape::vcv.phylo(phy) # phylo variance-covariance matrix
  corfac <- chol(solve(G)) # 'correction factor' per Butler et al. (2000) 
  U  <- as.data.frame(      # initialize U for 'corrected' trait values
    matrix(NA, nrow=dim(tra)[[1]], ncol=dim(tra)[[2]],
           dimnames=list(rn,cn)))
  tra <- droplevels(tra)  # drop unused factor levels
  tra[,sapply(tra,function(x)nlevels(x)==1)] <- 1 # correct 1-level factors
  for(j in 1:n){       # corrections per individual trait
    M       <- model.matrix(as.formula(paste0("~0+",cn[j])), data=tra)
    corrtra <- data.frame(corfac %*% M)
    U[,j]   <- corrtra[rn, ,drop=FALSE]
  }
  return(U)
}
### do the correction, then compare
p <- phylo_corr(phy, tra) 
par(mfrow=c(2,2))
for(i in 7:10) {
  plot(tra[,i], p[,i], pch=16, cex=0.7, col='#00000050',
       main=dimnames(p)[[2]][i], xlab='Original', ylab='Corrected')
}

Key references

Butler, M.A., T.W. Schoener, and J.B. Losos. 2000. The relationship between sexual size dimorphism and habitat use in Greater Antillean Anolis lizards. Evolution 54:259-272.

Eklöf, A., and D.B. Stouffer. 2016. The phylogenetic component of food web structure and intervality. Theoretical Ecology 9:107–115.

Community phylogenetics

How do species' evolutionary relationships explain responses? {#phylo-main}

Plot phylogenies

### basic plotting
par(mfrow=c(1,3), mar=c(0,0,0,0), oma=c(0,0,0,0)) # plotting parameters
phy                                               # basic structure
plot(phy, cex=0.75, no.margin=T)                  # basic plotting

### color tip labels by trait values
plot(phy, cex=0.75, tip.color=colvec(tra$lfp), no.margin=T)
# axisPhylo(cex=0.6)

### symbolize trait values at tips
plot(phy, cex=0.75, label.offset = 48, no.margin=T)
tiplabels(pch = 21, bg = c(tra$annual), adj = 0)
tiplabels(pch = 21, bg = c(tra$biennial), adj = 15)
tiplabels(pch = 21, bg = c(tra$perennial), adj = 30)

Phylogenetic diversity - alpha

According to some people, phylogenetic diversity lets you infer community assembly processes (competitive exclusion, environmental filtering, neutral dynamics). Some other serious people doubt this.

### phylogenetic diversity metrics
Dp  <- cophenetic(phy) # phylogenetic distances
### Faith's PD phylogenetic diversity = total branch length connecting all species in a site
fpd <- pd(spe, phy)    # Faith's PD
### mean pairwise distance
mpd <- ses.mpd(spe,  Dp, null.model='independentswap') 
### mean nearest taxon distance
mnd <- ses.mntd(spe, Dp, null.model='independentswap') 
### bring all together
phy_div <- cbind(fpd, mpd = mpd$mpd.obs.z, mntd = mnd$mntd.obs.z)
head(phy_div)
pairs(phy_div)

Phylogenetic diversity - beta

Phylogenetic Bray-Curtis dissimilarities = fraction of branch-length shared between communities. Different from the cophenetic branch lengths (between-species), this shows sample unit dissimilarities (between-sites).

### correlation between phylogenetic and taxonomic beta-diversity
Dp <- phylosor(spe, phy) # phylogenetic distances
protest(Dp, D)           # procrustes correlation

### side-by-side NMS ordinations
par(mfrow=c(1,2))
plot(scores(metaMDS(D)), col=colvec(1:97), pch=19, main='Taxonomic NMS')
plot(scores(metaMDS(Dp)), col=colvec(1:97), pch=19, main='Phylogenetic NMS')

Phylogenetic signal

How similar are traits among related species? {#phylo-signal}

We focus on a measure called Blomberg's K to detect phylogenetic signal in traits (Revell et al. 2008). A value of K > 1.0 indicates that traits of related species are more similar than expected from Brownian motion, and K < 1.0 indicates traits of related species are more dissimilar than expected at random.

### detect phylogenetic signal for traits
K <- sapply(tra, FUN=function(j){
  names(j) <- rownames(tra)
  round(picante::phylosignal(j, ape::multi2di(phy)),6)})
as.matrix(K)
### Thinker: which traits have strong evidence for conservatism?

Key references

Revell, L.J., L.J. Harmon, and D.C. Collar. 2008. Phylogenetic signal, evolutionary process, and rate. Systematic Biology 57:591–601.

Community spatial analysis

To what degree are communities structured spatially? {#spatial-mantel}

Mantel test

The Mantel test (Mantel 1967) compares a community dissimilarity matrix to a spatial distance matrix (it can also be used to compare any two dissimilarity matrices, for example, to test goodness-of-fit of Euclidean ordination distances to original community dissimilarities, or two ordination configurations). Here we ask, is compositional dissimilarity related to spatial distances?

E <- dist(xy)                            # spatial distance matrix
vegan::mantel(D, E, method='spearman')   # spearman *rank* correlation
# in `ecodist` package, we also get useful bootstrap CIs:
ecodist::mantel(D ~ E, mrank=T)          # spearman *rank* correlation

A Mantel correlogram depicts this correlation of compositional dissimilarity at specified lag distances.

brk <- seq(0, round(max(E),-1), by=10)
plot(ecodist::mgram(D, E, breaks=brk, nperm=99, mrank=T, nboot=500))
abline(h=0, col='red')
plot(vegan::mantel.correlog(D, E, break.pts=brk, cutoff=F, r.type='spearman',
                            nperm=99, mult='holm', progressive=T))

MRM: multiple regression on distance matrices

MRM: multiple regression on distance matrices (Lichstein 2007).

# species dissimilarities are related to space (extremely weakly) and 
#   potassium (moderately)
ecodist::MRM(D ~ dist(xy) + dist(env$k), nperm=999) 

# abundance of cosmopolitan bulrush is NOT related to space, 
#   but is related to potassium (moderately)
ecodist::MRM(dist(spe$bolboschoenus_maritimus) ~ dist(xy) + dist(env$k), nperm=999)

PCNM: principle coordinates of neighbor matrices.

How much variation is explained by space, environment, or spatially-structured environment? {#spatial-pcnm}

We can quantify spatial distances among sites with a Euclidean distance matrix. Performing principal coordinates analysis on this "neighbor matrix" gives us PCNMs: principle coordinates of neighbor matrices. PCNM scores are interpreted as orthogonal explanatory variables at varying spatial scales: think of them as progressively broad-, medium- and local-scale predictors.

E   <- dist(xy)            # euclidean distances between sites
pc  <- pcnm(E)             # principal coordinates of neighbor matrices
par(mfrow=c(2,3), oma=c(0,0,0,0), mar=c(2,2,2,0)) # map PCNMs across study area
v   <- 2^(0:5)             # select PCNMs at increasingly fine scale
for(i in seq_along(v)) {
  ordisurf(xy, scores(pc, choices=v[i]), bubble=3, main=paste0('PCNM ',v[i]))
}

We can use weighted principle coordinates of neighbor matrices as explanatory variables in CCA to construct a multiscale ordination (MSO, Wagner 2004) variogram. Residual variance here would indicate spatial autocorrelation; we hope for this to exhibit no trend with distance, since PCNMs account for spatial autocorrelation.

rs    <- rowSums(spe) / sum(spe)  # sites weighted by abundances 
pc    <- pcnm(E, w=rs)            # *weighted* PCNMs
ord   <- cca(spe ~ scores(pc))    # CCA: species constrained by space
msord <- mso(ord, xy)             # multiscale ordination
msoplot(msord, expl=T)            # MSO variogram: spatial trend in residuals?

Using the weighted PCNMs above, we can perform variance partitioning of species dissimilarities in dbRDA. This particular variation partitioning shows the amount of variation in species dissimilarities (D) explained by 1) space, 2) environment, and 3) spatially-structured environment.

(vp <- varpart(D, env, scores(pc))) 
plot(vp, bg = c(2,4))
text(-0.2, 0.3, 'Environment', cex=0.7)
text(0.5, 0.3, 'Spatially-\nstructured\nenviro', cex=0.7)
text(1.2, 0.3, 'Space', cex=0.7)

We can include more explanatory matrices -- here, species dissimilarities explained by 1) environment, 2) broad spatial, and 3) local spatial.

pc_broad <- scores(pc)[,1:10]
pc_local <- scores(pc)[,11:59]
(vp <- varpart(D, env, pc_broad, pc_local)) 
plot(vp, cex=0.7, bg=2:5) 
text(-0.2, 0.3, 'Environment', cex=0.7)
text(0.5, 0.55, 'Broad-\nstructured\nenviro', cex=0.7)
text(1.2, 0.3, 'Broad\nspatial', cex=0.7)
text(-0.5, -0.75, 'Local-\nstructured\nenviro', cex=0.7)
text(0.5, -1.15, 'Local\nspatial', cex=0.7)
### Thinker: why do "broad" and "local" share in common zero variance explained?

We can include more explanatory matrices -- here, species dissimilarities explained by 1) soil fractions, 2) soil chemistry, and 3) spatial processes.

(vp <- varpart(D,                               # dissimilarities
               ~ clay + sand + silt,            # soil fractions
               ~ mg + k + conductivity + na_l,  # soil chemistry
               scores(pc),                      # spatial predictors
               data=env))
plot(vp, cex=0.7, bg=2:5)
text(-0.2, 0.3, 'Soil fractions', cex=0.7)
text(1.2, 0.3, 'Soil chemistry', cex=0.7)
text(0.5, -1.15, 'Spatial', cex=0.7)
text(-0.5, -0.75, 'Spatially-\nstructured\nsoil fractions', cex=0.7)
text(1.5, -0.75, 'Spatially-\nstructured\nsoil chemistry', cex=0.7)
### Thinker: how do you interpret this figure?  
### Moderate spatial structuring, a bit owing to soil chemistry,
### but mostly to other unmeasured spatial processes (dispersal?).

### Thinker: what does this NOT explain?  
### What is the biological cause of residual variation?
### We did ordination in 1) species space, 2) env space, 
###   3) env constrained spp space?

Key references

Borcard D. and Legendre P. 2002. All-scale spatial analysis of ecological data by means of principal coordinates of neighbour matrices. Ecological Modelling 153: 51-68.

Dray, S., R. Pélissier, P. Couteron, M.-J. Fortin, P. Legendre, P. R. Peres-Neto, E. Bellier, R. Bivand, F. G. Blanchet, M. De Cáceres, A.-B. Dufour, E. Heegaard, T. Jombart, F. Munoz, J. Oksanen, J. Thioulouse, and H. H. Wagner. 2012. Community ecology in the age of multivariate multiscale spatial analysis. Ecological Monographs 82: 257-275.

Lichstein, J. 2007. Multiple regression on distance matrices: a multivariate spatial analysis tool. Plant Ecology 188:117-131.

Mantel, N. 1967. The detection of disease clustering and a generalized regression approach. Cancer Research 27:209-220.

Wagner, H.H. 2004. Direct multi-scale ordination with canonical correspondence analysis. Ecology 85: 342-351.

Appendix 1: Workflow example {#workflow}

This workflow uses common best-practices in ecological community analysis. It is somewhat opinionated, and is by no means universal or obligatory!

Data I/O: read data into R (typically read.csv if CSV files, or load if in R binary format).

Basic exploration: use basic commands to examine data structure (str), peek at data (head), see the frequency distribution (hist), or make biplots (plot or pairs).

Advanced exploration: vegan::tabasco or labdsv::abuocc to visualize species abundance matrices.

Transformations: use vegan::tabasco for relativizations, and arithmetic for transformations. Try log~10~(x+1) for abundances, row-standardize abundances to put sites on equal footing, and range-standardize for environment or traits (so all on same scale) -- McCune and Grace 2002

Outliers: sample units with mean dissimilarities > defined number of SDs from grand mean -- McCune and Grace 2002

Community dissimilarities: use vegan::vegdist to get Bray-Curtis dissimilarities -- Legendre and Legendre 2012

Environmental distances: use scale and dist to get (scaled) Euclidean distances

Dissimilarity adjustments: use vegan::stepacross for step-across shortest distance in the presence of high beta-diversity -- Smith 2017

Ordination of sites, based on communities: use vegan::metaMDS for NMS -- Minchin 1987

Ordination of sites, based on environment: use stats::prcomp for PCA -- linear combinations

Ordination of sites, based on communities constrained by environment: use vegan::dbrda for dbRDA, or fso::mfso for FSO -- McArdle and Anderson 2001 or Roberts 2008

Comparing ordination configurations: use vegan::protest for Procrustes analysis -- Peres-Neto and Jackson 2001

Clustering: use vegclust::vegclust for fuzzy clustering, or optpart::optpart for optimal partitioning -- de Cáceres 2010 or Roberts 2015

Group differences: use vegan::adonis2 for PERMANOVA -- Anderson and Walsh 2013

Group indicator species: use labdsv::indval for the usual single-group IndVal, or indicspecies::multipatt for multi-group IndVal -- Dufrene and Legendre 1997 or de Cáceres et al. 2010

Appendix 2: Further resources

General R usage

R for Data Science{target="_blank"}

Data Carpentry{target="_blank"}

Ecology in R

Dave Roberts' labdsv pages{target="_blank"}

Jari Oksanen's vegan introduction - PDF{target="_blank"}

David Zelený's community analysis{target="_blank"}

Tad Fukami's ecological statistics{target="_blank"}

Steve Kembel traits and phylogenies tutorial{target="_blank"}

Community analysis R packages

CRAN Environmetrics Task View{target="_blank"}
vegan{target="_blank"}
labdsv{target="_blank"}
optpart{target="_blank"}
ade4{target="_blank"}
ecodist{target="_blank"}

phytomosaic/esa2021 documentation built on Aug. 10, 2022, 3 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

phytomosaic/esa2021 ESA 2021 Workshop on Community Analysis in R

In phytomosaic/esa2021: ESA 2021 Workshop on Community Analysis in R

Overview

Why are we here today? {#why-here}

Learning outcomes

Assumptions

Software installation

Load packages

Useful functions to explore

Define objects and functions

Data

Where did this dataset originate? {#dataorigin}

Mafragh, Algeria vegetation data

Load data

Study area description

History of the Mafragh dataset

Key references

Pre-analysis

How do I get my data into shape for analysis? {#preanalysis-shape}

Check data structure

Find and replace troublesome values

Data transformations

Abundances to presence/absence

Outliers

Test validity of species matrix

Visualize data

Key references

Diversity measures

How diverse is this site? many sites? region? {#diversity-main}

Gamma (regional) diversity

Alpha (per-site) diversity

Beta (among-site) diversity: Whittaker's

Beta diversity: dust bunny indices

Beta-diversity: no-share sites

Key references

Dissimilarities

How dissimilar are community compositions among sites? {#dissim-main}

Dissimilarity matrix for species

Stepacross adjustment

Distance matrix for environment

Key references

Ordination (unconstrained)

How best to order sites by dissimilarities of community compositions? {#ord-unconstrained}

Visualizing ordinations

Ordination goodness-of-fit

Dimensionality selection in NMS

Key references

Ordination (constrained)

How best to order sites by dissimilarities of community compositions, in response to measured environmental variables? {#ord-constrained}

Key references

Group clustering

How to group sites according to dissimilarity of community compositions? {#clustering}

Hierarchical: Ward's clustering

Non-hierarchical: fuzzy clustering

Optimal partitioning

Cluster evaluation and goodness-of-fit

Key references

Group differences

Do community compositions significantly differ among groups? {#permanova}

Test for difference in community compositions

Test for homogeneity of community compositions

PERMANOVA equivalence to ANOVA

PERMANOVA for blocked design

Key references

Group indicator species

Which species are indicative of pre-defined groups? {#indval}

Key references

Community traits

How do species’ evolutionary relationships explain responses? {#traits-main}

Trait dissimilarity

Are traits (for species in a site) convergent or divergent? {#trait-convergent}

Nonlinear trait-environment correlations

How strongly are traits related to environment? {#trait-env-corr}

Fourth-corner analysis

Phylogenetic correction of traits

How to remove phylogenetic relatedness from traits? {#trait-phylo-correct}

Key references

Community phylogenetics

How do species' evolutionary relationships explain responses? {#phylo-main}

Plot phylogenies

phytomosaic/esa2021
ESA 2021 Workshop on Community Analysis in R