ellipsoid_selection: ellipsoid_selection: Performs variable selection for...
In luismurao/ntbox: From Getting Biodiversity Data to Evaluating Species Distribution Models in a Friendly GUI Environment

ellipsoid_selection

R Documentation

ellipsoid_selection: Performs variable selection for ellipsoid models

Description

Performs variable selection for ellipsoid models according to omission rates in the environmental space.

Usage

ellipsoid_selection(
  env_train,
  env_test = NULL,
  env_vars,
  nvarstest,
  level = 0.95,
  mve = TRUE,
  env_bg = NULL,
  omr_criteria,
  parallel = F,
  ncores = NULL,
  comp_each = 100,
  proc = FALSE,
  sub_sample = FALSE,
  sub_sample_size = 10000,
  proc_iter = 100,
  rseed = TRUE
)

Arguments

`env_train`	A data frame with the environmental training data.
`env_test`	A data frame with the environmental testing data. The default is NULL if given the selection process will show the p-value of a binomial test.
`env_vars`	A vector with the names of environmental variables to be used in the selection process.
`nvarstest`	A vector indicating the number of variables to fit the ellipsoids during model selection. It is allowed to test models with a different number of variables (i.e. nvarstest=c(3,6)).
`level`	Proportion of points to be included in the ellipsoids. This parameter is equivalent to the error (E) proposed by Peterson et al. (2008).
`mve`	A logical value. If TRUE a minimum volume ellipsoid will be computed using the function `cov.rob` of the MASS package. If False the covariance matrix of the input data will be used.
`env_bg`	Environmental data to compute the approximated prevalence of the model. The data should be a sample of the environmental layers of the calibration area.
`omr_criteria`	Omission rate criteria. Value of the omission rate allowed for the selection process. Default NULL see details.
`parallel`	The computations will be run in parallel. Default FALSE
`ncores`	The number of cores that will be used for the parallel process. By default ntbox will use the total number of available cores less one.
`comp_each`	Number of models to run in each job in the parallel computation. Default 100
`proc`	Logical if TRUE a partial roc test will be run.
`sub_sample`	Logical. Indicates whether the pROC test should run using a subsample of size sub_sample_size. It is recommended for big rasters
`sub_sample_size`	Numeric. Size of the sample to be used for computing pROC values.
`proc_iter`	Numeric. The total number of iterations for the partial ROC bootstrap.
`rseed`	Logical. Whether or not to set a random seed for partial roc bootstrap. Default TRUE.

Details

Model selection occurs in environmental space (E-space). For each variable combination the omission rate (omr) in E-space is computed using the function inEllipsoid. The results will be ordered by omr and if the user-specified the environmental background "env_bg" an estimated prevalence will be computed and the results will be ordered also by "bg_prevalence".

The number of variables to construct candidate models can be specified by the user in the parameter "nvarstest". Model selection will be run in parallel if the user-specified more than one set of combinations and the total number of models to be tested is greater than 500. If given"omr_criteria" and "bg_prevalence", the results will be shown pondering those models that met the "omr_criteria" by the value of "bg_prevalence". For more details and examples go to ellipsoid_omr help.

Value

A data.frame with 5 columns: i) "fitted_vars" the names of variables that were fitted; ii) "om_rate" omission rates of the model; iii) "bg_prevalence" approximated prevalence of the model see details section; iv) The rank value of importance in model selection by omission rate; v) The rank value by prevalence after if the value of omr_criteria is passed.

Author(s)

Luis Osorio-Olvera luismurao@gmail.com

References

Peterson, A.T. et al. (2008) Rethinking receiver operating characteristic analysis applications in ecological niche modeling. Ecol. Modell., 213, 63–72.

Examples

## Not run: 
# Bioclimatic layers path
wcpath <- list.files(system.file("extdata/bios",
                                package = "ntbox"),
                    pattern = ".tif$",full.names = TRUE)
# Bioclimatic layers
wc <- raster::stack(wcpath)
# Occurrence data for the giant hummingbird (Patagona gigas)
pg <- utils::read.csv(system.file("extdata/p_gigas.csv",
                                  package = "ntbox"))
# Split occs in train and test
pgL <- base::split(pg,pg$type)
pg_train <- pgL$train
pg_test <- pgL$test
# Environmental data for training and testing
pg_etrain <- raster::extract(wc,pg_train[,c("longitude",
                                            "latitude")],
                             df=TRUE)
pg_etrain <- pg_etrain[,-1]
pg_etest <- raster::extract(wc,pg_test[,c("longitude",
                                          "latitude")],
                            df=TRUE)
pg_etest <- pg_etest[,-1]

# Non-correlated variables
env_varsL <- ntbox::correlation_finder(cor(pg_etrain),
                                       threshold = 0.8,
                                       verbose = F)
env_vars <- env_varsL$descriptors
# Number of variables to fit ellipsoids (3,5,6 )
nvarstest <- c(3,5,6)
# Level
level <- 0.95
# Environmental background to compute the appoximated
# prevalence in the prediction
env_bg <- raster::sampleRandom(wc,10000)

# Selection process

e_selct <- ntbox::ellipsoid_selection(env_train = pg_etrain,
                                      env_test = pg_etest,
                                      env_vars = env_vars,
                                      level = level,
                                      nvarstest = nvarstest,
                                      env_bg = env_bg,
                                      omr_criteria=0.07)

# Best ellipsoid model for "omr_criteria" and prevalence
bestvarcomb <- stringr::str_split(e_selct$fitted_vars,",")[[1]]

# Ellipsoid model projection

best_mod <- ntbox::cov_center(pg_etrain[,bestvarcomb],
                              mve = T,
                              level = 0.99,
                              vars = 1:length(bestvarcomb))


# Projection model in geographic space

mProj <- ntbox::ellipsoidfit(wc[[bestvarcomb]],
                             centroid = best_mod$centroid,
                             covar = best_mod$covariance,
                             level = 0.99,size = 3)

raster::plot(mProj$suitRaster)
points(pg[,c("longitude","latitude")],pch=20,cex=0.5)

pg_proc <- ntbox::pROC(continuous_mod = mProj$suitRaster,
                       test_data = pg_test[,c("longitude","latitude")],
                       n_iter = 1000,
                       E_percent = 5,
                       boost_percent = 50,parallel = F)
print(pg_proc$pROC_summary)

## End(Not run)

luismurao/ntbox documentation built on June 14, 2025, 9:57 p.m.