modelselect_opt: Modelselect configuration file
In statnmap/SDMSelect: Cross-validation model selection and species distribution mapping

Description Usage Arguments Details Value Examples

Modelselect configuration file

modelselect_opt(
  ...,
  RESET = FALSE,
  READ.ONLY = NULL,
  LOCAL = FALSE,
  ADD = FALSE
)

`...`	Other options as listed below
`RESET`	reset options to default
`READ.ONLY`	Logical. To not modify the options
`LOCAL`	Logical. set to true to use modified options in a local environment but not in global.
`ADD`	Logical. New options can be added after the option function is created by explicitely specifying ADD = TRUE

The script is not a "everything is working fine by default" script. Different options have to be defined along the script. Users are advised to run the code by little parts to avoid any miss-specifications of parameters. Different types of distribution are available as well as different criteria of selection. Users are advised to verify if options match their dataset

MPIcalc: set to TRUE if you are running the functions on a MPI cluster
nbclust: number of cores to use. Default to all cores available.
graph: Show all figures during calculation process (Recommend FALSE)
datatype: data type. Default to ContPosNull. See below
modeltypes: All model types to compare depending on datatype.
MaxDist: Maximum distance for kriging (For datatype = KrigeGLM)
Phi: Vector of length 2 with min and max for fitting Phi (For datatype = KrigeGLM)
Model: Vector of length of modeltypes with variogram model as used with cov.spatial (For datatype = KrigeGLM)
lcc_proj: Projection in meters. Default to Lambert93.(For datatype = KrigeGLM)
Lambda: Vector of length of modeltypes with lambda corresponding to box-cox transformation (For datatype = KrigeGLM).
fix.Lambda: Vector of length of modeltypes with logical, whether to fix lambda or let the model chose it (For datatype = KrigeGLM).
Max_nb_Var: Maximum number of covariates kept in a model
Max_K Maximum: degree of freedom for simple variables in GAM models. Default Max_K=5 similar to Max_K_Poly=4
Max_K_te: Maximum degree of freedom for tensor interactions in GAM models
Max_K_Poly: Maximum degree of polynom for simple variables in GLM models
fixXI: 0=chosen by model (time consuming), 1=Poisson, 2=Gamma, 1<FixXI<2 compound poisson (For datatype = TweedGLM). See tweedie
Interaction: Logical. Whether to test for covariates interactions in models. A maximum of three interactions will be tested in the same model. This may be highly time consuming. Should be FALSE with any "KrigeGLM*".
MinNbModel: minimum number of models kept at each iteration to avoid removing not to bad models
k_fold Numeric: Value k of the k-fold cross-validation
N_k_fold: Numeric Value N of the N times k-fold cross-validation
nbMC: N_k_fold * k_fold (for compatibility)
seqthd: Sequence of thresholds tested to cut between 0 and 1 for PA data.
lim_pvalue: p-value limit for models retained in each of the "modeltype" cross-validation stepwise approach. Procedure seek for models having ranks not significantly different (p>=lim_pvalue) than the best model . "lim_pvalue" can be small for each step to keep a little more models than necessary. If p-value is not significant, models are considered with similar power of prediction, and thus kept in the following iteration. The smallest the p-value, the less discriminant the test, thus the highest number of models retained
lim_pvalue_final: Similar to lim_pvalue but used to select best model among all models at the very end of the procedure. This value may be higher than lim_pvalue to be more discriminant. This allows to select the best model among all models that have been fitted after the different cross-validations
Y.max: value to which multiplying the maximum value of data observations. This is used as a maximum predicted value for uncertainty calculation with Tweedie model as calculation of too high values maybe very long. This also fixes the maximum prediction for species distribution mapping when the model has a high uncertainty for sparse positions and lead to impossibly too high values.
seed: Numeric Seed for random number generation (Allow reproducibility between simulations)
seqthd: This allows to choose the best threshold value to predict presence-absence with probability of presence. Balance between 0 and 1 in data may deviate the best threshold from value 0.5. The best threshold is calculated among all cross-validations. Threshold value chosen is the one the closest to specificity = sensitivity.

datatype options

PA: Presence-Absence data (=binomial distribution)
Cont: Continuous data. Positive and/or negative. (=gaussian distribution)
ContPosNull: Continuous but positive (or null) data like Biomass/Density data. For data with zero values, modeltypes allowing only positive values can also be tested (LogNormal, Gamma). In that case, a box-cox transformation will be applied: model is fitted on 'log(X+1)', but cross-validation compares 'pred(Y)-1', which is in the scale of the data. This allows comparison with Gaussian distribution. (=gaussian distribution; modeltypes are Gaussian, Gamma, Lognormal and Tweedie)
Count: model of Count data (=Poisson model)
PosCount: Poisson model on positive data (Difference with Count in goodness of fit)
TweedGLM: Tweedie distribution (only GLM)
KrigeGLM: Co-kriging model with gaussian distribution. Experimental - Equation of variogram have to be tested separately before any run with this script, verify that modelselect_opt is correctly set up: Default values are not safe at all !

modeltypes are types of model tested: GLM, GLM with natural splines (GLMns), GAM. If there is no "GLM" in the name of the modeltype, then GAM is fitted with library mgcv.

A list with all options

## Not run: 
# Reset options
modelselect_opt(RESET = TRUE)
# List options
modelselect_opt()
# Show one option
modelselect_opt("modeltypes")
# Modify datatype
modelselect_opt$datatype <- "PA"
# Modify modeltypes tested
modelselect_opt("modeltypes") <- c("PA", "PAGLM")

## End(Not run)

statnmap/SDMSelect documentation built on April 1, 2021, 2:01 p.m.

statnmap/SDMSelect index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

statnmap/SDMSelect
Cross-validation model selection and species distribution mapping

modelselect_opt: Modelselect configuration file
In statnmap/SDMSelect: Cross-validation model selection and species distribution mapping

Description

Usage

Arguments

Details

Value

Examples

Related to modelselect_opt in statnmap/SDMSelect...

R Package Documentation

Browse R Packages

We want your feedback!

statnmap/SDMSelect Cross-validation model selection and species distribution mapping

modelselect_opt: Modelselect configuration file In statnmap/SDMSelect: Cross-validation model selection and species distribution mapping

Description

Usage

Arguments

Details

Value

Examples

Related to modelselect_opt in statnmap/SDMSelect...

R Package Documentation

Browse R Packages

We want your feedback!

statnmap/SDMSelect
Cross-validation model selection and species distribution mapping

modelselect_opt: Modelselect configuration file
In statnmap/SDMSelect: Cross-validation model selection and species distribution mapping