Description Usage Arguments Value Author(s) References See Also Examples
This function performs the important task of evaluating the performance of a modeling procedure with resampling, including tuning and pre-processing to not bias the results by information leakage.
1 2 3 4 |
procedure |
Modeling procedure, or list of modeling procedures, as
produced by |
x |
Dataset, observations as rows and descriptors as columns. |
y |
Response vector. |
resample |
The test subsets used for parameter tuning. Leave blank to
randomly generate a resampling scheme of the same kind as is used by
|
pre_process |
Function that performs pre-processing and splits dataset into fitting and test subsets. |
.save |
What parts of the modeling results to return to the user. If
|
.cores |
Number of CPU-cores to use for parallel computation.
The current implementation is based on |
.checkpoint_dir |
Directory to save intermediate results to, after every completed fold. The directory will be created if it doesn't exist, but not recursively. |
.return_error |
If |
.verbose |
Whether to print an activity log. |
A list tree where the top level corresponds to folds (in case of
multiple folds), the next level corresponds to the modeling procedures
(in case of multiple procedures), and the final level is specified by the
.save
parameter. It typically contains a subset of the following
elements:
error
Performance estimate of the fitted model. See
error_fun
for more information.
fit
Fitted model.
prediction
Predictions given by the model.
importance
Feature importance scores.
tune
Results from the parameter tuning. See
tune
for details.
Christofer Bäcklin
Hastie T, Tibshirani R, Friedman J (2001). The Elements of Statistical Learning. 1st edition. Springer-Verlag. doi:10.1007/978-0-387-21606-5.
Varma S, Simon R (2006). Bias in Error Estimation When Using Cross-Validation for Model Selection. BMC Bioinformatics, 7(91). doi:10.1186/1471-2105-7-91.
Lawless JF, Yuan Y (2010). Estimation of Prediction Error for Survival Models. Statistics in Medicine, 29(2), 262–272. doi:10.1002/sim.3758.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | x <- iris[-5]
y <- iris$Species
cv <- resample("crossvalidation", y, nfold = 4, nrepeat = 4)
result <- evaluate("lda", x, y, resample=cv)
# Multiple procedures fitted and tested simultaneously.
# This is useful when the dataset is large and the splitting takes a long time.
# If you name the elements of the list emil will also name the elements of the
# results object in the same way.
result <- evaluate(c(Linear = "lda", Quadratic = "qda"), x, y, resample=cv)
# Multicore parallelization (on a single computer)
result <- evaluate("lda", x, y, resample=cv, .cores=2)
# Parallelization using a cluster (not limited to a single computer)
# PSOCK is supported on windows too!
require(parallel)
cl <- makePSOCKcluster(2)
clusterEvalQ(cl, library(emil))
clusterExport(cl, c("x", "y"))
result <- parLapply(cl, cv, function(fold)
evaluate("lda", x, y, resample=fold))
stopCluster(cl)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.