sperrorest | R Documentation |
sperrorest is a flexible interface for multiple types of parallelized spatial and non-spatial cross-validation and bootstrap error estimation and parallelized permutation-based assessment of spatial variable importance.
sperrorest(
formula,
data,
coords = c("x", "y"),
model_fun,
model_args = list(),
pred_fun = NULL,
pred_args = list(),
smp_fun = partition_cv,
smp_args = list(),
train_fun = NULL,
train_param = NULL,
test_fun = NULL,
test_param = NULL,
err_fun = err_default,
imp_variables = NULL,
imp_permutations = 1000,
imp_sample_from = c("test", "train", "all"),
importance = !is.null(imp_variables),
distance = FALSE,
do_gc = 1,
progress = "all",
benchmark = FALSE,
mode_rep = c("future", "sequential", "loop"),
mode_fold = c("sequential", "future", "loop"),
verbose = 0
)
formula |
A formula specifying the variables used by the |
data |
a |
coords |
vector of length 2 defining the variables in |
model_fun |
Function that fits a predictive model, such as |
model_args |
Arguments to be passed to |
pred_fun |
Prediction function for a fitted model object created by
|
pred_args |
(optional) Arguments to |
smp_fun |
A function for sampling training and test sets from |
smp_args |
(optional) Arguments to be passed to |
train_fun |
(optional) A function for resampling or subsampling the training sample in order to achieve, e.g., uniform sample sizes on all training sets, or maintaining a certain ratio of positives and negatives in training sets. E.g. resample_uniform or resample_strat_uniform. |
train_param |
(optional) Arguments to be passed to |
test_fun |
(optional) Like |
test_param |
(optional) Arguments to be passed to |
err_fun |
A function that calculates selected error measures from the
known responses in |
imp_variables |
(optional; used if |
imp_permutations |
(optional; used if |
imp_sample_from |
(default: |
importance |
logical (default: |
distance |
logical (default: |
do_gc |
numeric (default: 1): defines frequency of memory garbage
collection by calling gc; if |
progress |
character (default: |
benchmark |
(optional) logical (default: |
mode_rep, mode_fold |
character (default: |
verbose |
Controls the amount of information printed while processing. Defaults to 0 (no output). |
Custom predict functions passed to pred_fun
, which consist of
multiple child functions, must be defined in one function.
A list (object of class sperrorest) with (up to) six components:
error_rep: sperrorestreperror
containing
predictive performances at the repetition level
error_fold: sperroresterror
object containing predictive
performances at the fold level
represampling: represampling object
importance: sperrorestimportance
object containing
permutation-based variable importances at the fold level
benchmark: sperrorestbenchmark
object containing
information on the system the code is running on, starting and
finishing times, number of available CPU cores and runtime performance
package_version: sperrorestpackageversion
object containing
information about the sperrorest package version
Running in parallel is supported via package future.
Have a look at vignette("future-1-overview", package = "future")
.
In short: Choose a backend and specify the number of workers, then call
sperrorest()
as usual. Example:
future::plan(future.callr::callr, workers = 2) sperrorest()
Parallelization at the repetition is recommended when using repeated cross-validation. If the 'granularity' of parallelized function calls is too fine, the overall runtime will be very poor since the overhead for passing arguments and handling environments becomes too large. Use fold-level parallelization only when the processing time of individual folds is very large and the number of repetitions is small or equals 1.
Note that nested calls to future
are not possible.
Therefore a sequential sperrorest
call should be used for
hyperparameter tuning in a nested cross-validation.
Brenning, A. 2012. Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: the R package 'sperrorest'. 2012 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 23-27 July 2012, p. 5372-5375. https://ieeexplore.ieee.org/document/6352393
Brenning, A. 2005. Spatial prediction models for landslide hazards: review, comparison and evaluation. Natural Hazards and Earth System Sciences, 5(6), 853-862. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.5194/nhess-5-853-2005")}
Brenning, A., S. Long & P. Fieguth. 2012. Detecting rock glacier flow structures using Gabor filters and IKONOS imagery. Remote Sensing of Environment, 125, 227-237. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.rse.2012.07.005")}
Russ, G. & A. Brenning. 2010a. Data mining in precision agriculture: Management of spatial information. In 13th International Conference on Information Processing and Management of Uncertainty, IPMU 2010; Dortmund; 28 June - 2 July 2010. Lecture Notes in Computer Science, 6178 LNAI: 350-359.
Russ, G. & A. Brenning. 2010b. Spatial variable importance assessment for yield prediction in Precision Agriculture. In Advances in Intelligent Data Analysis IX, Proceedings, 9th International Symposium, IDA 2010, Tucson, AZ, USA, 19-21 May 2010. Lecture Notes in Computer Science, 6065 LNCS: 184-195.
## ------------------------------------------------------------
## Classification tree example using non-spatial partitioning
## ------------------------------------------------------------
# Muenchow et al. (2012), see ?ecuador
fo <- slides ~ dem + slope + hcurv + vcurv + log.carea + cslope
library(rpart)
mypred_part <- function(object, newdata) predict(object, newdata)[, 2]
ctrl <- rpart.control(cp = 0.005) # show the effects of overfitting
# show the effects of overfitting
fit <- rpart(fo, data = ecuador, control = ctrl)
### Non-spatial cross-validation:
mypred_part <- function(object, newdata) predict(object, newdata)[, 2]
nsp_res <- sperrorest(
data = ecuador, formula = fo,
model_fun = rpart,
model_args = list(control = ctrl),
pred_fun = mypred_part,
progress = TRUE,
smp_fun = partition_cv,
smp_args = list(repetition = 1:2, nfold = 3)
)
summary(nsp_res$error_rep)
summary(nsp_res$error_fold)
summary(nsp_res$represampling)
# plot(nsp_res$represampling, ecuador)
### Spatial cross-validation:
sp_res <- sperrorest(
data = ecuador, formula = fo,
model_fun = rpart,
model_args = list(control = ctrl),
pred_fun = mypred_part,
progress = TRUE,
smp_fun = partition_kmeans,
smp_args = list(repetition = 1:2, nfold = 3)
)
summary(sp_res$error_rep)
summary(sp_res$error_fold)
summary(sp_res$represampling)
# plot(sp_res$represampling, ecuador)
smry <- data.frame(
nonspat_training = unlist(summary(nsp_res$error_rep,
level = 1
)$train_auroc),
nonspat_test = unlist(summary(nsp_res$error_rep,
level = 1
)$test_auroc),
spatial_training = unlist(summary(sp_res$error_rep,
level = 1
)$train_auroc),
spatial_test = unlist(summary(sp_res$error_rep,
level = 1
)$test_auroc)
)
boxplot(smry,
col = c("red", "red", "red", "green"),
main = "Training vs. test, nonspatial vs. spatial",
ylab = "Area under the ROC curve"
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.