ellipsoid_selection: ellipsoid_selection: Performs variable selection for...

View source: R/ellipsoid_selection.R

ellipsoid_selectionR Documentation

ellipsoid_selection: Performs variable selection for ellipsoid models

Description

Performs variable selection for ellipsoid models according to omission rates in the environmental space.

Usage

ellipsoid_selection(
  env_train,
  env_test = NULL,
  env_vars,
  nvarstest,
  level = 0.95,
  mve = TRUE,
  env_bg = NULL,
  omr_criteria,
  parallel = F,
  ncores = NULL,
  comp_each = 100,
  proc = FALSE,
  sub_sample = FALSE,
  sub_sample_size = 10000,
  proc_iter = 100,
  rseed = TRUE
)

Arguments

env_train

A data frame with the environmental training data.

env_test

A data frame with the environmental testing data. The default is NULL if given the selection process will show the p-value of a binomial test.

env_vars

A vector with the names of environmental variables to be used in the selection process.

nvarstest

A vector indicating the number of variables to fit the ellipsoids during model selection. It is allowed to test models with a different number of variables (i.e. nvarstest=c(3,6)).

level

Proportion of points to be included in the ellipsoids. This parameter is equivalent to the error (E) proposed by Peterson et al. (2008).

mve

A logical value. If TRUE a minimum volume ellipsoid will be computed using the function cov.rob of the MASS package. If False the covariance matrix of the input data will be used.

env_bg

Environmental data to compute the approximated prevalence of the model. The data should be a sample of the environmental layers of the calibration area.

omr_criteria

Omission rate criteria. Value of the omission rate allowed for the selection process. Default NULL see details.

parallel

The computations will be run in parallel. Default FALSE

ncores

The number of cores that will be used for the parallel process. By default ntbox will use the total number of available cores less one.

comp_each

Number of models to run in each job in the parallel computation. Default 100

proc

Logical if TRUE a partial roc test will be run.

sub_sample

Logical. Indicates whether the pROC test should run using a subsample of size sub_sample_size. It is recommended for big rasters

sub_sample_size

Numeric. Size of the sample to be used for computing pROC values.

proc_iter

Numeric. The total number of iterations for the partial ROC bootstrap.

rseed

Logical. Whether or not to set a random seed for partial roc bootstrap. Default TRUE.

Details

Model selection occurs in environmental space (E-space). For each variable combination the omission rate (omr) in E-space is computed using the function inEllipsoid. The results will be ordered by omr and if the user-specified the environmental background "env_bg" an estimated prevalence will be computed and the results will be ordered also by "bg_prevalence".

The number of variables to construct candidate models can be specified by the user in the parameter "nvarstest". Model selection will be run in parallel if the user-specified more than one set of combinations and the total number of models to be tested is greater than 500. If given"omr_criteria" and "bg_prevalence", the results will be shown pondering those models that met the "omr_criteria" by the value of "bg_prevalence". For more details and examples go to ellipsoid_omr help.

Value

A data.frame with 5 columns: i) "fitted_vars" the names of variables that were fitted; ii) "om_rate" omission rates of the model; iii) "bg_prevalence" approximated prevalence of the model see details section; iv) The rank value of importance in model selection by omission rate; v) The rank value by prevalence after if the value of omr_criteria is passed.

Author(s)

Luis Osorio-Olvera luismurao@gmail.com

References

Peterson, A.T. et al. (2008) Rethinking receiver operating characteristic analysis applications in ecological niche modeling. Ecol. Modell., 213, 63–72.

Examples

## Not run: 
# Bioclimatic layers path
wcpath <- list.files(system.file("extdata/bios",
                                package = "ntbox"),
                    pattern = ".tif$",full.names = TRUE)
# Bioclimatic layers
wc <- raster::stack(wcpath)
# Occurrence data for the giant hummingbird (Patagona gigas)
pg <- utils::read.csv(system.file("extdata/p_gigas.csv",
                                  package = "ntbox"))
# Split occs in train and test
pgL <- base::split(pg,pg$type)
pg_train <- pgL$train
pg_test <- pgL$test
# Environmental data for training and testing
pg_etrain <- raster::extract(wc,pg_train[,c("longitude",
                                            "latitude")],
                             df=TRUE)
pg_etrain <- pg_etrain[,-1]
pg_etest <- raster::extract(wc,pg_test[,c("longitude",
                                          "latitude")],
                            df=TRUE)
pg_etest <- pg_etest[,-1]

# Non-correlated variables
env_varsL <- ntbox::correlation_finder(cor(pg_etrain),
                                       threshold = 0.8,
                                       verbose = F)
env_vars <- env_varsL$descriptors
# Number of variables to fit ellipsoids (3,5,6 )
nvarstest <- c(3,5,6)
# Level
level <- 0.95
# Environmental background to compute the appoximated
# prevalence in the prediction
env_bg <- raster::sampleRandom(wc,10000)

# Selection process

e_selct <- ntbox::ellipsoid_selection(env_train = pg_etrain,
                                      env_test = pg_etest,
                                      env_vars = env_vars,
                                      level = level,
                                      nvarstest = nvarstest,
                                      env_bg = env_bg,
                                      omr_criteria=0.07)

# Best ellipsoid model for "omr_criteria" and prevalence
bestvarcomb <- stringr::str_split(e_selct$fitted_vars,",")[[1]]

# Ellipsoid model projection

best_mod <- ntbox::cov_center(pg_etrain[,bestvarcomb],
                              mve = T,
                              level = 0.99,
                              vars = 1:length(bestvarcomb))


# Projection model in geographic space

mProj <- ntbox::ellipsoidfit(wc[[bestvarcomb]],
                             centroid = best_mod$centroid,
                             covar = best_mod$covariance,
                             level = 0.99,size = 3)

raster::plot(mProj$suitRaster)
points(pg[,c("longitude","latitude")],pch=20,cex=0.5)

pg_proc <- ntbox::pROC(continuous_mod = mProj$suitRaster,
                       test_data = pg_test[,c("longitude","latitude")],
                       n_iter = 1000,
                       E_percent = 5,
                       boost_percent = 50,parallel = F)
print(pg_proc$pROC_summary)

## End(Not run)

luismurao/ntbox documentation built on Nov. 22, 2024, 4 a.m.