gbsm: A generalised B-spline modelling for a set of neutral and...

View source: R/gbsm.R

gbsmR Documentation

A generalised B-spline modelling for a set of neutral and trait-based variables

Description

This function implements the generalised B-spline model (sensu Lagat et al., 2021b) for dissecting the effects of random encounter versus functional trait mismatching on multi-species co-occurrence and interference. Generalized linear model (sensu Hastie and Tibshirani, 1986) with binomial variance distribution and log link functions employed, with predictors transformed using a linear combination of B-splines (sensu Curry and Schoenberg, 1988).

Usage

gbsm(
  s.data,
  t.data,
  p.d.mat,
  metric = "Simpson_eqn",
  d.f = 4,
  order.jo = 3,
  degree = 3,
  n = 1000,
  b.plots = TRUE,
  gbsm.model,
  scat.plot = TRUE,
  response.curves = TRUE,
  ylabel = TRUE,
  leg = 1,
  max.vif = 20,
  max.vif2 = 10,
  start.range = c(-0.1, 0)
)

Arguments

s.data

A species-by-site presence/absence data.frame with entries indicating occurrence (1) and non-occurrence (0) of species in a site.

t.data

A data.frame with traits as columns and species as rows. The species must be the same as in s.data.

p.d.mat

A symmetric matrix with dimension names as species and entries indicating the phylogenetic distance between any two of them (species).

metric

The type of rescaling applied to the joint occupancy metric. Available options are: Simpson_eqn for Simpson equivalent, Sorensen_eqn for Sorensen equivalent, raw_prop for the raw form of the metric rescaled by dividing by the total number of sites, N, and raw for the raw form of the metric without rescaling.

d.f

Degrees of freedom for B-splines.

order.jo

Specific number of species for which the joint occupancy is computed. To implement generalised B-spline modelling for multiple orders, see gbsm_m.orders function.

degree

Degree of the B-splines.

n

Number of samples for which the joint occupancy is computed. These samples are non-overlapping. I.e., sampling is done without replacement. If the total number of combinations of i species chosen from the total species pool m, i.e. choose(m,i), is less than this value (n), choose(m,i) is used as the (maximum) number of samples one can set. Otherwise sampling without replacement is performed to select just the n samples.

b.plots

Boolean value indicating if B-spline basis functions (of the first predictor) should be plotted.

gbsm.model

The model used if the raw form of the metric is choosen. Availbale options are "quasipoisson" for quasipoisson GLM or "nb" for negative binomial GLM. Other metric types strictly uses binomial GLM.

scat.plot

Boolean value indicating if scatter plots between joint occupancy and its predicted values should be plotted.

response.curves

A boolean value indicating if all response curves should be plotted.

ylabel

Boolean value indicating if the y label should be included in the response curves. This parameter is added to help control the appearance of plots in gbsm_m.orders function.

leg

Boolean value indicating if the legend of the gbsm outputs should be included in the plots. This parameter is added to help control the appearance of plots in gbsm_m.orders function.

max.vif

This parameter is used to detect and avoid multi-collinearity among covariates. Its value can be varied to have an intermediate GBSM model (based on GLM) with certain VIF values. Any predictor variable (from the original model) with VIF greater than this value is removed. This can be repeated until an ideal VIF of less or equal to a desired value is achieved.

max.vif2

Like max.vif, this parameter is used to detect and avoid multi-collinearity among covariates. Its value can be varied to have a final GBSM model (based on GLM) with certain VIF values much less than max.vif. Any predictor variable (from the intermediate model) with VIF greater than this value is removed. This can be repeated until an ideal VIF of less or equal to a desired value is achieved.

start.range

Range of starting values for glm regression.

Value

gbsm function returns a list containing the following outputs:

order.jo

Order of joint occupancy

Predictors

Predictor variables used in GLM regression with binomial variance distribution function and log link function.

Responses

Response variables from GLM regression with binomial variance distribution function and log link function.

coeff

Coefficients of the generalized linear model used.

glm_obj

Generalized linear model used.

j.occs

Rescaled observed joint occupancies. See metricabove.

bs_pred

B-spline-transformed Predictors.

var.expld

Amount of variation in joint occupancy explained by the Predictors. I.e., it is the Pearson's r^2 between the observed and predicted values of joint occupancy.

Original.VIFs

Variance inflation factors from the original GBSM model (before removing covariates exceeding max.vif).

Intermediate.VIFs

Variance inflation factors from the intermediate GBSM model (after removing the 1st set of covariates exceeding max.vif).

Final.VIFs

Variance inflation factors from the final GBSM model (after removing the 2nd set of covariates exceeding max.vif2).

summary

summary of the regression results

References

  1. Curry, H. B., and Schoenberg, I. J. (1988). On Pólya frequency functions IV: the fundamental spline functions and their limits. In IJ Schoenberg Selected Papers (pp. 347-383). Birkhäuser, Boston, MA. https://doi.org/10.1007/978-1-4899-0433-1_17

  2. Hastie, T., and Tibshirani, R. (1986). Generalized additive models. Stat. Sci. 1(3), 297-310. https://doi.org/10.1214/ss/1177013604

  3. Lagat, V. K., Latombe, G. and Hui, C. (2021a). A multi-species co-occurrence index to avoid type II errors in null model testing. DOI: ⁠<To be added>⁠.

  4. Lagat, V. K., Latombe, G. and Hui, C. (2021b). Dissecting the effects of random encounter versus functional trait mismatching on multi-species co-occurrence and interference with generalised B-spline modelling. DOI: ⁠<To be added>⁠.

Examples

## Not run: 
 my.path <- system.file("extdata/gsmdat", package = "msco")
 setwd(my.path)
 s.data <- get(load("s.data.csv")) ## Species-by-site matrix
 t.data <- get(load("t.data.csv")) ## Species-by-trait matrix
 p.d.mat <- get(load("p.d.mat.csv")) ## Species-by-species phylogenetic distance matrix

 RNGkind(sample.kind = "Rejection")
 set.seed(0)
 my.gbsm <- msco::gbsm(s.data, t.data, p.d.mat, metric = "Simpson_eqn", gbsm.model,
  d.f=4, order.jo=3, degree=3, n=1000, b.plots=TRUE, scat.plot=TRUE,
   response.curves=TRUE, leg=1, max.vif = 10, max.vif2 = 3,
    start.range=c(-0.1,0))

 head(my.gbsm$bs_pred)
 head(my.gbsm$Predictors)
 head(my.gbsm$Responses)
 my.gbsm$order.jo
 my.gbsm$var.expld
 my.gbsm$Original.VIFs
 my.gbsm$Intermediate.VIFs ## Resulting covariate VIFs after removing covariates with VIF > max.vif
 my.gbsm$Final.VIFs ## Resulting covariate VIFs after removing covariates with VIF > max.vif2

 
## End(Not run)


vitaliskim/msco documentation built on Sept. 29, 2023, 9:22 p.m.