gbsm: A generalised B-spline modelling for a set of neutral and...
In vitaliskim/msco: Multi-Species Co-Occurrence Analyses

View source: R/gbsm.R

gbsm	R Documentation

A generalised B-spline modelling for a set of neutral and trait-based variables

Description

This function implements the generalised B-spline model (sensu Lagat et al., 2021b) for dissecting the effects of random encounter versus functional trait mismatching on multi-species co-occurrence and interference. Generalized linear model (sensu Hastie and Tibshirani, 1986) with binomial variance distribution and log link functions employed, with predictors transformed using a linear combination of B-splines (sensu Curry and Schoenberg, 1988).

Usage

gbsm(
  s.data,
  t.data,
  p.d.mat,
  metric = "Simpson_eqn",
  d.f = 4,
  order.jo = 3,
  degree = 3,
  n = 1000,
  b.plots = TRUE,
  gbsm.model,
  scat.plot = TRUE,
  response.curves = TRUE,
  ylabel = TRUE,
  leg = 1,
  max.vif = 20,
  max.vif2 = 10,
  start.range = c(-0.1, 0)
)

Arguments

`s.data`	A species-by-site presence/absence `data.frame` with entries indicating occurrence (1) and non-occurrence (0) of species in a site.
`t.data`	A `data.frame` with traits as columns and species as rows. The species must be the same as in `s.data`.
`p.d.mat`	A symmetric `matrix` with dimension names as species and entries indicating the phylogenetic distance between any two of them (species).
`metric`	The type of rescaling applied to the joint occupancy metric. Available options are: `Simpson_eqn` for Simpson equivalent, `Sorensen_eqn` for Sorensen equivalent, `raw_prop` for the raw form of the metric rescaled by dividing by the total number of sites, N, and `raw` for the raw form of the metric without rescaling.
`d.f`	Degrees of freedom for B-splines.
`order.jo`	Specific number of species for which the joint occupancy is computed. To implement generalised B-spline modelling for multiple orders, see gbsm_m.orders function.
`degree`	Degree of the B-splines.
`n`	Number of samples for which the joint occupancy is computed. These samples are non-overlapping. I.e., sampling is done without replacement. If the total number of combinations of `i` species chosen from the total species pool `m`, i.e. `choose(m,i)`, is less than this value (`n`), `choose(m,i)` is used as the (maximum) number of samples one can set. Otherwise sampling without replacement is performed to select just the `n` samples.
`b.plots`	Boolean value indicating if B-spline basis functions (of the first predictor) should be plotted.
`gbsm.model`	The model used if the `raw` form of the metric is choosen. Availbale options are `"quasipoisson"` for quasipoisson GLM or `"nb"` for negative binomial GLM. Other metric types strictly uses binomial GLM.
`scat.plot`	Boolean value indicating if scatter plots between joint occupancy and its predicted values should be plotted.
`response.curves`	A boolean value indicating if all response curves should be plotted.
`ylabel`	Boolean value indicating if the y label should be included in the response curves. This parameter is added to help control the appearance of plots in gbsm_m.orders function.
`leg`	Boolean value indicating if the legend of the gbsm outputs should be included in the plots. This parameter is added to help control the appearance of plots in gbsm_m.orders function.
`max.vif`	This parameter is used to detect and avoid multi-collinearity among covariates. Its value can be varied to have an intermediate GBSM model (based on GLM) with certain `VIF` values. Any predictor variable (from the original model) with `VIF` greater than this value is removed. This can be repeated until an ideal `VIF` of less or equal to a desired value is achieved.
`max.vif2`	Like `max.vif`, this parameter is used to detect and avoid multi-collinearity among covariates. Its value can be varied to have a final GBSM model (based on GLM) with certain `VIF` values much less than `max.vif`. Any predictor variable (from the intermediate model) with `VIF` greater than this value is removed. This can be repeated until an ideal `VIF` of less or equal to a desired value is achieved.
`start.range`	Range of starting values for glm regression.

Value

gbsm function returns a list containing the following outputs:

`order.jo`	Order of joint occupancy
`Predictors`	Predictor variables used in GLM regression with binomial variance distribution function and log link function.
`Responses`	Response variables from GLM regression with binomial variance distribution function and log link function.
`coeff`	Coefficients of the generalized linear model used.
`glm_obj`	Generalized linear model used.
`j.occs`	Rescaled observed joint occupancies. See `metric`above.
`bs_pred`	B-spline-transformed `Predictors`.
`var.expld`	Amount of variation in joint occupancy explained by the `Predictors`. I.e., it is the Pearson's `r^2` between the observed and predicted values of joint occupancy.
`Original.VIFs`	Variance inflation factors from the original GBSM model (before removing covariates exceeding `max.vif`).
`Intermediate.VIFs`	Variance inflation factors from the intermediate GBSM model (after removing the 1st set of covariates exceeding `max.vif`).
`Final.VIFs`	Variance inflation factors from the final GBSM model (after removing the 2nd set of covariates exceeding `max.vif2`).
`summary`	summary of the regression results

References

Curry, H. B., and Schoenberg, I. J. (1988). On Pólya frequency functions IV: the fundamental spline functions and their limits. In IJ Schoenberg Selected Papers (pp. 347-383). Birkhäuser, Boston, MA. https://doi.org/10.1007/978-1-4899-0433-1_17
Hastie, T., and Tibshirani, R. (1986). Generalized additive models. Stat. Sci. 1(3), 297-310. https://doi.org/10.1214/ss/1177013604
Lagat, V. K., Latombe, G. and Hui, C. (2021a). A multi-species co-occurrence index to avoid type II errors in null model testing. DOI: ⁠<To be added>⁠.
Lagat, V. K., Latombe, G. and Hui, C. (2021b). Dissecting the effects of random encounter versus functional trait mismatching on multi-species co-occurrence and interference with generalised B-spline modelling. DOI: ⁠<To be added>⁠.

Examples

## Not run: 
 my.path <- system.file("extdata/gsmdat", package = "msco")
 setwd(my.path)
 s.data <- get(load("s.data.csv")) ## Species-by-site matrix
 t.data <- get(load("t.data.csv")) ## Species-by-trait matrix
 p.d.mat <- get(load("p.d.mat.csv")) ## Species-by-species phylogenetic distance matrix

 RNGkind(sample.kind = "Rejection")
 set.seed(0)
 my.gbsm <- msco::gbsm(s.data, t.data, p.d.mat, metric = "Simpson_eqn", gbsm.model,
  d.f=4, order.jo=3, degree=3, n=1000, b.plots=TRUE, scat.plot=TRUE,
   response.curves=TRUE, leg=1, max.vif = 10, max.vif2 = 3,
    start.range=c(-0.1,0))

 head(my.gbsm$bs_pred)
 head(my.gbsm$Predictors)
 head(my.gbsm$Responses)
 my.gbsm$order.jo
 my.gbsm$var.expld
 my.gbsm$Original.VIFs
 my.gbsm$Intermediate.VIFs ## Resulting covariate VIFs after removing covariates with VIF > max.vif
 my.gbsm$Final.VIFs ## Resulting covariate VIFs after removing covariates with VIF > max.vif2

 
## End(Not run)

vitaliskim/msco documentation built on Sept. 29, 2023, 9:22 p.m.