View source: R/ellipsoid_selection.R
ellipsoid_selection | R Documentation |
Performs variable selection for ellipsoid models according to omission rates in the environmental space.
ellipsoid_selection(
env_train,
env_test = NULL,
env_vars,
nvarstest,
level = 0.95,
mve = TRUE,
env_bg = NULL,
omr_criteria,
parallel = F,
ncores = NULL,
comp_each = 100,
proc = FALSE,
sub_sample = FALSE,
sub_sample_size = 10000,
proc_iter = 100,
rseed = TRUE
)
env_train |
A data frame with the environmental training data. |
env_test |
A data frame with the environmental testing data. The default is NULL if given the selection process will show the p-value of a binomial test. |
env_vars |
A vector with the names of environmental variables to be used in the selection process. |
nvarstest |
A vector indicating the number of variables to fit the ellipsoids during model selection. It is allowed to test models with a different number of variables (i.e. nvarstest=c(3,6)). |
level |
Proportion of points to be included in the ellipsoids. This parameter is equivalent to the error (E) proposed by Peterson et al. (2008). |
mve |
A logical value. If TRUE a minimum volume ellipsoid will be computed using
the function |
env_bg |
Environmental data to compute the approximated prevalence of the model. The data should be a sample of the environmental layers of the calibration area. |
omr_criteria |
Omission rate criteria. Value of the omission rate allowed for the selection process. Default NULL see details. |
parallel |
The computations will be run in parallel. Default FALSE |
ncores |
The number of cores that will be used for the parallel process. By default ntbox will use the total number of available cores less one. |
comp_each |
Number of models to run in each job in the parallel computation. Default 100 |
proc |
Logical if TRUE a partial roc test will be run. |
sub_sample |
Logical. Indicates whether the pROC test should run using a subsample of size sub_sample_size. It is recommended for big rasters |
sub_sample_size |
Numeric. Size of the sample to be used for computing pROC values. |
proc_iter |
Numeric. The total number of iterations for the partial ROC bootstrap. |
rseed |
Logical. Whether or not to set a random seed for partial roc bootstrap. Default TRUE. |
Model selection occurs in environmental space (E-space). For each variable combination the omission rate (omr) in E-space is computed using the function inEllipsoid
. The results will be ordered by omr and if the user-specified the environmental background "env_bg" an estimated prevalence will be computed and the results will be ordered also by "bg_prevalence".
The number of variables to construct candidate models can be specified by the user in the parameter "nvarstest". Model selection will be run in parallel if the user-specified more than one set of combinations and the total number of models to be tested is greater than 500.
If given"omr_criteria" and "bg_prevalence", the results will be shown pondering those models that met the "omr_criteria" by the value of "bg_prevalence".
For more details and examples go to ellipsoid_omr
help.
A data.frame with 5 columns: i) "fitted_vars" the names of variables that were fitted; ii) "om_rate" omission rates of the model; iii) "bg_prevalence" approximated prevalence of the model see details section; iv) The rank value of importance in model selection by omission rate; v) The rank value by prevalence after if the value of omr_criteria is passed.
Luis Osorio-Olvera luismurao@gmail.com
Peterson, A.T. et al. (2008) Rethinking receiver operating characteristic analysis applications in ecological niche modeling. Ecol. Modell., 213, 63–72.
## Not run:
# Bioclimatic layers path
wcpath <- list.files(system.file("extdata/bios",
package = "ntbox"),
pattern = ".tif$",full.names = TRUE)
# Bioclimatic layers
wc <- raster::stack(wcpath)
# Occurrence data for the giant hummingbird (Patagona gigas)
pg <- utils::read.csv(system.file("extdata/p_gigas.csv",
package = "ntbox"))
# Split occs in train and test
pgL <- base::split(pg,pg$type)
pg_train <- pgL$train
pg_test <- pgL$test
# Environmental data for training and testing
pg_etrain <- raster::extract(wc,pg_train[,c("longitude",
"latitude")],
df=TRUE)
pg_etrain <- pg_etrain[,-1]
pg_etest <- raster::extract(wc,pg_test[,c("longitude",
"latitude")],
df=TRUE)
pg_etest <- pg_etest[,-1]
# Non-correlated variables
env_varsL <- ntbox::correlation_finder(cor(pg_etrain),
threshold = 0.8,
verbose = F)
env_vars <- env_varsL$descriptors
# Number of variables to fit ellipsoids (3,5,6 )
nvarstest <- c(3,5,6)
# Level
level <- 0.95
# Environmental background to compute the appoximated
# prevalence in the prediction
env_bg <- raster::sampleRandom(wc,10000)
# Selection process
e_selct <- ntbox::ellipsoid_selection(env_train = pg_etrain,
env_test = pg_etest,
env_vars = env_vars,
level = level,
nvarstest = nvarstest,
env_bg = env_bg,
omr_criteria=0.07)
# Best ellipsoid model for "omr_criteria" and prevalence
bestvarcomb <- stringr::str_split(e_selct$fitted_vars,",")[[1]]
# Ellipsoid model projection
best_mod <- ntbox::cov_center(pg_etrain[,bestvarcomb],
mve = T,
level = 0.99,
vars = 1:length(bestvarcomb))
# Projection model in geographic space
mProj <- ntbox::ellipsoidfit(wc[[bestvarcomb]],
centroid = best_mod$centroid,
covar = best_mod$covariance,
level = 0.99,size = 3)
raster::plot(mProj$suitRaster)
points(pg[,c("longitude","latitude")],pch=20,cex=0.5)
pg_proc <- ntbox::pROC(continuous_mod = mProj$suitRaster,
test_data = pg_test[,c("longitude","latitude")],
n_iter = 1000,
E_percent = 5,
boost_percent = 50,parallel = F)
print(pg_proc$pROC_summary)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.