traitglm | R Documentation |
Fits a fourth corner model - a model to study how variation in environmental response across taxa can be explained by their traits. The function to use for fitting can be (pretty well) any predictive model, default is a generalised linear model, another good option is to add a LASSO penalty via glm1path
. Can handle overdispersed counts via family="negative.binomial"
, which is the default family
argument.
traitglm(L, R, Q = NULL, family="negative.binomial", formula = NULL, method = "manyglm", composition = FALSE, col.intercepts = TRUE, ...)
L |
A data frame (or matrix) containing the abundances for each taxon (columns) across all sites (rows). |
R |
A data frame (or matrix) of environmental variables (columns) across all sites (rows). |
Q |
A data frame (or matrix) of traits (columns) across all taxa (rows). If not specified, a different environmental response will be specified for each taxon. |
family |
The family of the response variable, see |
formula |
A one-sided formula specifying exactly how to model abundance as a function of environmental and trait variables (as found in |
method |
The function to use to fit the model. Default is |
composition |
logical. TRUE includes a row effect in the model, adjusting for different sampling intensities across different samples. This can be understood as a compositional term in the sense that all other terms then model relative abundance at a site. FALSE (default) does not include a row effect, hence the model is of absolute abundance. |
col.intercepts |
logical. TRUE (default) includes a column effect in the model, to adjust for different levels of abundance of different response (column) variables. FALSE removes this column effect. |
... |
Arguments passed to the function specified at |
This function fits a fourth corner model, that is, a model to predict abundance across several taxa (stored in L
) as a function of environmental variables (R
) and traits (Q
). The environment-trait interaction can be understood as the fourth corner, giving the set of coefficients that describe how environmental response across taxa varies as traits vary. A species effect is include in the model (i.e. a different intercept term for each species), so that traits are used to explain patterns in relative abundance across taxa not patterns in absolute abundance.
The actual function used to fit the model is determined by the user through the method
argument. The default is to use manyglm
to fit a GLM, although for predictive modelling, it might be better to use a LASSO penalty as in glm1path
and cv.glm1path
. In glm1path
, the penalty used for BIC calculation is log(dim(L)[1])
, i.e. log(number of sites).
The model is fitted by vectorising L
then constructing a big matrix from repeated values of R
, Q
, their quadratic terms (if required) and interactions. Hence this function will hit memory issues if any of these matrices are large, and can slow down (especially if using cv.glm1path
). If formula
is left unspecified, the design matrix is constructed using all environmental variables and traits specified in R
and Q
, and quadratic terms for any of these variables that are quantitative, and all environment-trait interactions, after standardising these variables. Specifying a one-sided formula
as a function of the variables in R
and Q
would instead give the user control over the precise model that is fitted, and drops the internal standardisations. The arguments composition
and col.intercepts
optionally add terms to the model for row and column total abundance, irrespective of whether a formula
has been specified.
Note: when specifying a formula
, if there are no penalties on coefficients (as for manyglm
), then main effects for R
can be excluded if including row effects (via composition=TRUE
), and main effects for Q
can be excluded if including column effects (via col.intercepts=TRUE
), because those terms are redundant (trying to explain main effects for row/column when these main effects are already in the model). If using penalised likelihood (as in glm1path
and cv.glm1path
) or a random effects model, by all means include main effects as well as row/column effects, and the penalties will sort out which terms to use.
If trait matrix Q
is not specified, default behaviour will fit a different environmental response for each taxon (and the outcome will be very similar to manyglm(L~R)
). This can be understood as a fourth corner model where species identities are used as the species traits (i.e. no attempt is made to explain differences across species).
These functions inherit default behaviour from their fitting functions. e.g. use plot
for a Dunn-Smyth residual plot from a traits model fitted using manyglm
or glm1path
.
Returns a traitglm
object, a list that contains at least the following components:
Exactly what is included in output depends on the fitting function - by default, a manyglm
object is returned, so all usual manyglm
output is included (coefficients, residuals, deviance, etc).
A family
object matching the final model.
A matrix of fourth corner coefficients. If formula
has been manually entered, this will be a vector not a matrix.
The reduced-size design matrix for environmental variables, including further arguments:
Data frame of (possibly standardised) environmental variables
A data frame containing the leading term in a quadratic expression (where appropriate) for environmental variables
A vector with the same dimension as the number of columns of X, listing the type of ecah enviromental variable ("quantitative"
" or "factor"
")
Coefficients used in transforming variables to orthogonality. These are used later to make predictions.
The reduced-size design matrix for traits, set up as for R.des
.
For LASSO fits: a vector of the same length as the final design matrix, indicating which variables had a penalty imposed on them in model fitting.
The data frame of abundances specified as input.
Logical, is any penalty applied to parameters at all (not if using a manyglm
fit).
A list of coefficients describing the standaridsations of variables used in analyses. Stored for use later if making predictions.
The original call traitglm
call.
David I. Warton <David.Warton@unsw.edu.au>
Brown AM, Warton DI, Andrew NR, Binns M, Cassis G and Gibb H (2014) The fourth corner solution - using species traits to better understand how species traits interact with their environment, Methods in Ecology and Evolution 5, 344-352.
Warton DI, Shipley B & Hastie T (2015) CATS regression - a model-based approach to studying trait-based community assembly, Methods in Ecology and Evolution 6, 389-398.
glm1path
, glm1
, manyglm
, family
, residuals.manyglm
, plot.manyany
data(antTraits) ft=traitglm(antTraits$abund,antTraits$env,antTraits$traits,method="manyglm") ft$fourth #print fourth corner terms # for a pretty picture of fourth corner coefficients, uncomment the following lines: # library(lattice) # a = max( abs(ft$fourth.corner) ) # colort = colorRampPalette(c("blue","white","red")) # plot.4th = levelplot(t(as.matrix(ft$fourth.corner)), xlab="Environmental Variables", # ylab="Species traits", col.regions=colort(100), at=seq(-a, a, length=100), # scales = list( x= list(rot = 45))) # print(plot.4th) plot(ft) # for a Dunn-smyth residual plot qqnorm(residuals(ft)); abline(c(0,1),col="red") # for a normal quantile plot. # predict to the first five sites predict(ft,newR=antTraits$env[1:5,]) # refit using LASSO and less variables, including row effects and only two interaction terms: ft1=traitglm(antTraits$abund,antTraits$env[,3:4],antTraits$traits[,c(1,3)], formula=~Shrub.cover:Femur.length+Shrub.cover:Pilosity,composition=TRUE,method="glm1path") ft1$fourth #notice LASSO penalty has one interaction to zero
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.