Zeta.msgdm: Multi-site generalised dissimilarity modelling for a set of...
In zetadiv: Functions to Compute Compositional Turnover Using Zeta Diversity

Zeta.msgdm

R Documentation

Multi-site generalised dissimilarity modelling for a set of environmental variables and distances

Description

Computes a regression model of zeta diversity for a given order (number of assemblages or sites) against a set of environmental variables and distances between sites. The different regression models available are generalised linear models, generalised linear models with negative constraints, generalised additive models, shape constrained additive models, and I-splines.

Usage

Zeta.msgdm(
  data.spec,
  data.env,
  xy = NULL,
  data.spec.pred = NULL,
  order = 1,
  sam = 1000,
  reg.type = "ispline",
  family = NULL,
  method.glm = "glm.fit.cons",
  cons = -1,
  cons.inter = NULL,
  confint.level = 0.95,
  bs = "mpd",
  kn = -1,
  order.ispline = 2,
  kn.ispline = 1,
  distance.type = "Euclidean",
  dist.custom = NULL,
  rescale = FALSE,
  rescale.pred = TRUE,
  method = "mean",
  normalize = "Simpson",
  silent = FALSE,
  empty.row = 0,
  control = list(),
  glm.init = FALSE
)

Arguments

`data.spec`	Site-by-species presence-absence data frame, with sites as rows and species as columns.
`data.env`	Site-by-variable data frame, with sites as rows and environmental variables as columns.
`xy`	Site coordinates, to account for distances between sites.
`data.spec.pred`	Site-by-species presence-absence data frame or list of data frames, with sites as rows and species as columns, for which zeta diversity will be computed and used as a predictor of the zeta diversity of `data.spec`.
`order`	Specific number of assemblages or sites at which zeta diversity is computed.
`sam`	Number of samples for which the zeta diversity is computed.
`reg.type`	Type of regression used in the multi-site generalised dissimilarity modelling. Default is "`ispline`" for I-spline models (forcing monotonic decline), as recommended in generalised dissimilarity modelling by Ferrier et al. (2007). Other options are "`glm`" for generalised linear models, "`ngls`" for negative linear models, "`gam`" for generalised additive models, "`scam`" for shape constrained additive models (with monotonic decreasing by default), and .
`family`	A description of the error distribution and link function to be used in the `glm`, `gam` and `scam` models used in the different types of regression (see `family` in package `stats` for details of family functions). Default is `binomial("log")` if Jaccard, Sorensen or Simpson similarity indices are used (see parameter `normalize`), or `gaussian` for raw zeta values.
`method.glm`	Method used in fitting the generalised linear model. The default method "glm.fit.cons" is an adaptation of method `glm.fit2` from package `glm2` using a constrained least squares regression (default is negative coefficients) in the reweighted least squares. Another option is "glm.fit2", which calls `glm.fit2`; see help documentation for glm.fit2 in package `glm2`.
`cons`	type of constraint in the glm if `method.glm = "glm.fit.cons"`. Default is -1 for negative coefficients on the predictors. The other option is 1 for positive coefficients on the predictors.
`cons.inter`	type of constraint for the intercept. If no value is specified, `cons.inter` is set to -1 (for negative intercept) for binomial family, or to 1 (for positive intercept) for Gaussian family
`confint.level`	Percentage for the confidence intervals of the coefficients from the generalised linear models.
`bs`	A two-letter character string indicating the (penalized) smoothing basis to use in the scam model. Default is "`mpd`" for monotonic decreasing splines. see `smooth.terms` in package `mgcv` for an overview of what is available.
`kn`	Number of knots in the GAM and SCAM. Default is -1 for determining kn automatically using Generalized Cross-validation.
`order.ispline`	Order of the I-spline.
`kn.ispline`	Number of knots in the I-spline.
`distance.type`	Method to compute distance. Default is "`Euclidean`", for Euclidean distance. The other options are (i) "`ortho`" for orthodromic distance, if xy correspond to longitudes and latitudes (orthodromic distance is computed using the `geodist` function from package `geodist`); and (ii) "`custom`", in which case the user must provide a distance matrix for `dist.custom`.
`dist.custom`	Distance matrix provided by the user when `distance.type` = `"custom"`.
`rescale`	Boolean value (TRUE or FALSE) indicating if the zeta values should be divided by the total number of species in the dataset, to get a range of values between 0 and 1. Has no effect if `normalize` != `FALSE`.
`rescale.pred`	Boolean value (TRUE or FALSE) indicating if the spatial distances and differences in environmental variables should be rescaled between 0 and 1.
`method`	Name of a function (as a string) indicating how to combine the pairwise differences and distances for more than 3 sites. It can be a basic R-function such as "`mean`" or "`max`", but also a custom function.
`normalize`	Indicates if the zeta values for each sample should be divided by the minimum number of species in the sites of this specific sample (`normalize = "Simpson"`) (default), by the total number of species for this specific sample (`normalize = "Jaccard"`), or by the average number of species per site for this specific sample (`normalize = "Sorensen"`). Value must be set to `FALSE` to indicate that no normalization must be performed and raw zeta values should be used.
`silent`	Boolean value (TRUE or FALSE) indicating if warnings must be printed.
`empty.row`	Determines how to handle empty rows, i.e. sites with no species. Such sites can cause underestimations of zeta diversity, and computation errors for the normalized version of zeta due to divisions by 0. Options are "`empty`" to let the data untreated, "`remove`" to remove the empty rows, 0 to set the normalized zeta to 0 when zeta is divided by 0 during normalization (sites share no species, so are completely dissimilar), and 1 to set the normalized zeta to 1 when zeta is divided by 0 during normalization (i.e. sites are perfectly similar).
`control`	As for `glm`.
`glm.init`	Boolean value, indicating if the initial parameters for fitting the glm with constraint on the coefficients signs for `reg.type == "ispline"` should be initialised based on the correlation coefficients betwen the zeta values and the environmental difference or distance. `glm.init = TRUE` helps preventing the error message: `error: cannot find valid starting values:` `please specify some`.

Details

The environmental variables can be numeric or factorial.

If order = 1, the variables are used as such in the regression, and factorial variables must be dummy for the output of the regression to be interpretable.

For numeric variables, if order>1 the pairwise difference between sites is computed and combined according to method. For factorial variables, the distance corresponds to the number of unique values over the number of assemblages of sites specified by order.

If xy = NULL, Zeta.msgdm only uses environmental variables in the regression. Otherwise, it also computes and uses euclidian distance (average or maximum distance between multiple sites, depending on the parameters method) as an explanatory variable.

If rescale.pred = TRUE, zeta is regressed against the differences of values of the environmental variables divided by the maximum difference for each variable, to be rescaled between 0 and 1. If !is.null(xy), distances between sites are also divided by the maximum distance. If order = 1, the variables are transformed by first subtracting their minimum value, and dividing by the difference of their maximum and minimum values.

If reg.type = "ispline", the variables are rescaled between 0 and 1 prior to computing the I-splines by subtracting their minimum value, and dividing by the difference of their maximum and minimum values.

Value

Zeta.msgdm returns a list whose component vary depending on the regression technique. The list can contain the following components:

`val`	Vector of zeta values used in the MS-GDM.
`predictors`	Data frame of the predictors used in the MS-GDM.
`range.min`	Vector containing the minimum values of the numeric variables, used for rescaling the variables between 0 and 1 for I-splines (see Details).
`range.max`	Vector containing the maximum values of the numeric variables, used for rescaling the variables between 0 and 1 for I-splines (see Details).
`rescale.factor`	Factor by which the predictors were divided if `rescale.pred = TRUE` and `order>1`.
`order.ispline`	The value of the original parameter, to be used in `Plot.ispline`.
`kn.ispline`	The value of the original parameter, to be used in `Plot.ispline`.
`model`	An object whose class depends on the type of regression (`glm`, `nnnpls`, `gam` or `scam`; I-splines return and object of class `glm`), corresponding to the regression over distance for the number of assemblages or sites specified in `order`.
`confint`	The confidence intervals for the coefficients from generalised linear models with no constraint. `confint` is not generated for the other types of regression.
`vif`	The variance inflation factors for all the variables for the generalised linear regression. `vif` is not generated for the other types of regression.

References

Hui C. & McGeoch M.A. (2014). Zeta diversity as a concept and metric that unifies incidence-based biodiversity patterns. The American Naturalist, 184, 684-694.

Ferrier, S., Manion, G., Elith, J., & Richardson, K. (2007). Using generalized dissimilarity modelling to analyse and predict patterns of beta diversity in regional biodiversity assessment. Diversity and Distributions, 13(3), 252-264.

Examples

utils::data(bird.spec.coarse)
xy.bird <- bird.spec.coarse[1:2]
data.spec.bird <- bird.spec.coarse[3:193]
utils::data(bird.env.coarse)
data.env.bird <- bird.env.coarse[,3:9]

zeta.glm <- Zeta.msgdm(data.spec.bird, data.env.bird, sam = 100, order = 3,
                               reg.type = "glm")
zeta.glm
dev.new()
graphics::plot(zeta.glm$model)

zeta.ngls <- Zeta.msgdm(data.spec.bird, data.env.bird, xy.bird, sam = 100, order = 3,
    reg.type = "ngls", rescale = TRUE)
zeta.ngls

##########

utils::data(Marion.species)
xy.marion <- Marion.species[1:2]
data.spec.marion <- Marion.species[3:33]
utils::data(Marion.env)
data.env.marion <- Marion.env[3]

zeta.gam <- Zeta.msgdm(data.spec.marion, data.env.marion, sam = 100, order = 3,
    reg.type = "gam")
zeta.gam
dev.new()
graphics::plot(zeta.gam$model)

zeta.ispline <- Zeta.msgdm(data.spec.marion, data.env.marion, xy.marion, sam = 100,
    order = 3, normalize = "Jaccard", reg.type = "ispline")
zeta.ispline

zeta.ispline.r <- Return.ispline(zeta.ispline, data.env.marion, distance = TRUE)
zeta.ispline.r

dev.new()
Plot.ispline(isplines = zeta.ispline.r, distance = TRUE)

dev.new()
Plot.ispline(msgdm = zeta.ispline, data.env = data.env.marion, distance = TRUE)

zetadiv documentation built on June 8, 2025, 1:02 p.m.