Fits a fourth corner model - a model to study how variation in environmental response across taxa can be explained by their traits. The function to use for fitting can be (pretty well) any predictive model, default is a generalised linear model, another good option is to add a LASSO penalty via
glm1path. Can handle overdispersed counts via
family="negative.binomial", which is the default
1 2 3
A data frame (or matrix) containing the abundances for each taxon (columns) across all sites (rows).
A data frame (or matrix) of environmental variables (columns) across all sites (rows).
A data frame (or matrix) of traits (columns) across all taxa (rows). If not specified, a different environmental response will be specified for each taxon.
The family of the response variable, see
A one-sided formula specifying exactly how to model abundance as a function of environmental and trait variables (as found in
The function to use to fit the model. Default is
logical. TRUE includes a row effect in the model, adjusting for different sampling intensities across different samples. This can be understood as a compositional term in the sense that all other terms then model relative abundance at a site. FALSE (default) does not include a row effect, hence the model is of absolute abundance.
logical. TRUE (default) includes a column effect in the model, to adjust for different levels of abundance of different response (column) variables. FALSE removes this column effect.
Arguments passed to the function specified at
This function fits a fourth corner model, that is, a model to predict abundance across several taxa (stored in
L) as a function of environmental variables (
R) and traits (
Q). The environment-trait interaction can be understood as the fourth corner, giving the set of coefficients that describe how environmental response across taxa varies as traits vary. A species effect is include in the model (i.e. a different intercept term for each species), so that traits are used to explain patterns in relative abundance across taxa not patterns in absolute abundance.
The actual function used to fit the model is determined by the user through the
method argument. The default is to use
manyglm to fit a GLM, although for predictive modelling, it might be better to use a LASSO penalty as in
glm1path, the penalty used for BIC calculation is
log(dim(L)), i.e. log(number of sites).
The model is fitted by vectorising
L then constructing a big matrix from repeated values of
Q, their quadratic terms (if required) and interactions. Hence this function will hit memory issues if any of these matrices are large, and can slow down (especially if using
formula is left unspecified, the design matrix is constructed using all environmental variables and traits specified in
Q, and quadratic terms for any of these variables that are quantitative, and all environment-trait interactions, after standardising these variables. Specifying a one-sided
formula as a function of the variables in
Q would instead give the user control over the precise model that is fitted, and drops the internal standardisations. The arguments
col.intercepts optionally add terms to the model for row and column total abundance, irrespective of whether a
formula has been specified.
Note: when specifying a
formula, if there are no penalties on coefficients (as for
manyglm), then main effects for
R can be excluded if including row effects (via
composition=TRUE), and main effects for
Q can be excluded if including column effects (via
col.intercepts=TRUE), because those terms are redundant (trying to explain main effects for row/column when these main effects are already in the model). If using penalised likelihood (as in
cv.glm1path) or a random effects model, by all means include main effects as well as row/column effects, and the penalties will sort out which terms to use.
If trait matrix
Q is not specified, default behaviour will fit a different environmental response for each taxon (and the outcome will be very similar to
manyglm(L~R)). This can be understood as a fourth corner model where species identities are used as the species traits (i.e. no attempt is made to explain differences across species).
These functions inherit default behaviour from their fitting functions. e.g. use
plot for a Dunn-Smyth residual plot from a traits model fitted using
traitglm object, a list that contains at least the following components:
Exactly what is included in output depends on the fitting function - by default, a
manyglm object is returned, so all usual
manyglm output is included (coefficients, residuals, deviance, etc).
family object matching the final model.
A matrix of fourth corner coefficients. If
formula has been manually entered, this will be a vector not a matrix.
The reduced-size design matrix for environmental variables, including further arguments:
Data frame of (possibly standardised) environmental variables
A data frame containing the leading term in a quadratic expression (where appropriate) for environmental variables
A vector with the same dimension as the number of columns of X, listing the type of ecah enviromental variable (
Coefficients used in transforming variables to orthogonality. These are used later to make predictions.
The reduced-size design matrix for traits, set up as for
For LASSO fits: a vector of the same length as the final design matrix, indicating which variables had a penalty imposed on them in model fitting.
The data frame of abundances specified as input.
Logical, is any penalty applied to parameters at all (not if using a
A list of coefficients describing the standaridsations of variables used in analyses. Stored for use later if making predictions.
The original call
David I. Warton <David.Warton@unsw.edu.au>
Brown AM, Warton DI, Andrew NR, Binns M, Cassis G and Gibb H (2014) The fourth corner solution - using species traits to better understand how species traits interact with their environment, Methods in Ecology and Evolution 5, 344-352.
Warton DI, Shipley B & Hastie T (2015) CATS regression - a model-based approach to studying trait-based community assembly, Methods in Ecology and Evolution 6, 389-398.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
data(antTraits) ft=traitglm(antTraits$abund,antTraits$env,antTraits$traits,method="manyglm") ft$fourth #print fourth corner terms # for a pretty picture of fourth corner coefficients, uncomment the following lines: # library(lattice) # a = max( abs(ft$fourth.corner) ) # colort = colorRampPalette(c("blue","white","red")) # plot.4th = levelplot(t(as.matrix(ft$fourth.corner)), xlab="Environmental Variables", # ylab="Species traits", col.regions=colort(100), at=seq(-a, a, length=100), # scales = list( x= list(rot = 45))) # print(plot.4th) plot(ft) # for a Dunn-smyth residual plot qqnorm(residuals(ft)); abline(c(0,1),col="red") # for a normal quantile plot. # predict to the first five sites predict(ft,newR=antTraits$env[1:5,]) # refit using LASSO and less variables, including row effects and only two interaction terms: ft1=traitglm(antTraits$abund,antTraits$env[,3:4],antTraits$traits[,c(1,3)], formula=~Shrub.cover:Femur.length+Shrub.cover:Pilosity,composition=TRUE,method="glm1path") ft1$fourth #notice LASSO penalty has one interaction to zero