evaluate: Evaluate a modeling procedure

Description Usage Arguments Value Author(s) References See Also Examples

View source: R/modeling.r

Description

This function performs the important task of evaluating the performance of a modeling procedure with resampling, including tuning and pre-processing to not bias the results by information leakage.

Usage

1
2
3
4
evaluate(procedure, x, y, resample, pre_process = pre_split, .save = c(model
  = TRUE, prediction = TRUE, error = TRUE, importance = FALSE), .cores = 1,
  .checkpoint_dir = NULL, .return_error = .cores > 1,
  .verbose = getOption("emil_verbose", TRUE))

Arguments

procedure

Modeling procedure, or list of modeling procedures, as produced by modeling_procedure.

x

Dataset, observations as rows and descriptors as columns.

y

Response vector.

resample

The test subsets used for parameter tuning. Leave blank to randomly generate a resampling scheme of the same kind as is used by evaluate to assess the performance of the whole modeling_procedure.

pre_process

Function that performs pre-processing and splits dataset into fitting and test subsets.

.save

What parts of the modeling results to return to the user. If importance is FALSE varible importance calculation will be skipped.

.cores

Number of CPU-cores to use for parallel computation. The current implementation is based on mcMap, which unfortunatelly do not work on Windows systems. It can however be re-implemented by the user fairly easily by setting up a PSOCK cluster and calling parLapply as in the example below. This solution might be included in future versions of the package, after further investigation.

.checkpoint_dir

Directory to save intermediate results to, after every completed fold. The directory will be created if it doesn't exist, but not recursively.

.return_error

If FALSE the entire modeling is aborted upon an error. If TRUE the modeling of the particular fold is aborted and the error message is returned instead of its results.

.verbose

Whether to print an activity log.

Value

A list tree where the top level corresponds to folds (in case of multiple folds), the next level corresponds to the modeling procedures (in case of multiple procedures), and the final level is specified by the .save parameter. It typically contains a subset of the following elements:

error

Performance estimate of the fitted model. See error_fun for more information.

fit

Fitted model.

prediction

Predictions given by the model.

importance

Feature importance scores.

tune

Results from the parameter tuning. See tune for details.

Author(s)

Christofer Bäcklin

References

Hastie T, Tibshirani R, Friedman J (2001). The Elements of Statistical Learning. 1st edition. Springer-Verlag. doi:10.1007/978-0-387-21606-5.

Varma S, Simon R (2006). Bias in Error Estimation When Using Cross-Validation for Model Selection. BMC Bioinformatics, 7(91). doi:10.1186/1471-2105-7-91.

Lawless JF, Yuan Y (2010). Estimation of Prediction Error for Survival Models. Statistics in Medicine, 29(2), 262–272. doi:10.1002/sim.3758.

See Also

emil, modeling_procedure

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
x <- iris[-5]
y <- iris$Species
cv <- resample("crossvalidation", y, nfold = 4, nrepeat = 4)
result <- evaluate("lda", x, y, resample=cv)

# Multiple procedures fitted and tested simultaneously. 
# This is useful when the dataset is large and the splitting takes a long time.
# If you name the elements of the list emil will also name the elements of the
# results object in the same way.
result <- evaluate(c(Linear = "lda", Quadratic = "qda"), x, y, resample=cv)

# Multicore parallelization (on a single computer)
result <- evaluate("lda", x, y, resample=cv, .cores=2)

# Parallelization using a cluster (not limited to a single computer)
# PSOCK is supported on windows too!
require(parallel)
cl <- makePSOCKcluster(2)
clusterEvalQ(cl, library(emil))
clusterExport(cl, c("x", "y"))
result <- parLapply(cl, cv, function(fold)
    evaluate("lda", x, y, resample=fold))
stopCluster(cl)

emil documentation built on Aug. 1, 2018, 1:03 a.m.

Related to evaluate in emil...