resamples: Collation and Visualization of Resampling Results In caret: Classification and Regression Training

Description

These functions provide methods for collection, analyzing and visualizing a set of resampling results from a common data set.

Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21``` ```resamples(x, ...) ## Default S3 method: resamples(x, modelNames = names(x), ...) ## S3 method for class 'resamples' sort(x, decreasing = FALSE, metric = x\$metric[1], FUN = mean, ...) ## S3 method for class 'resamples' summary(object, metric = object\$metrics, ...) ## S3 method for class 'resamples' as.matrix(x, metric = x\$metric[1], ...) ## S3 method for class 'resamples' as.data.frame(x, row.names = NULL, optional = FALSE, metric = x\$metric[1], ...) modelCor(x, metric = x\$metric[1], ...) ## S3 method for class 'resamples' print(x, ...) ```

Arguments

 `x` a list of two or more objects of class `train`, `sbf` or `rfe` with a common set of resampling indices in the `control` object. For `sort.resamples`, it is an object generated by `resamples`. `...` only used for `sort` and `modelCor` and captures arguments to pass to `sort` or `FUN`. `modelNames` an optional set of names to give to the resampling results `decreasing` logical. Should the sort be increasing or decreasing? `metric` a character string for the performance measure used to sort or computing the between-model correlations `FUN` a function whose first argument is a vector and returns a scalar, to be applied to each model's performance measure. `object` an object generated by `resamples` `row.names, optional` not currently used but included for consistency with `as.data.frame`

Details

The ideas and methods here are based on Hothorn et al. (2005) and Eugster et al. (2008).

The results from `train` can have more than one performance metric per resample. Each metric in the input object is saved.

`resamples` checks that the resampling results match; that is, the indices in the object `trainObject\$control\$index` are the same. Also, the argument `trainControl` `returnResamp` should have a value of `"final"` for each model.

The summary function computes summary statistics across each model/metric combination.

Value

For `resamples`: an object with class `"resamples"` with elements

 `call ` the call `values ` a data frame of results where rows correspond to resampled data sets and columns indicate the model and metric `models ` a character string of model labels `metrics ` a character string of performance metrics `methods ` a character string of the `train` `method` argument values for each model

For `sort.resamples` a character string in the sorted order is generated. `modelCor` returns a correlation matrix.

Max Kuhn

References

Hothorn et al. The design and analysis of benchmark experiments. Journal of Computational and Graphical Statistics (2005) vol. 14 (3) pp. 675-699

Eugster et al. Exploratory and inferential analysis of benchmark experiments. Ludwigs-Maximilians-Universitat Munchen, Department of Statistics, Tech. Rep (2008) vol. 30

`train`, `trainControl`, `diff.resamples`, `xyplot.resamples`, `densityplot.resamples`, `bwplot.resamples`, `splom.resamples`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33``` ```data(BloodBrain) set.seed(1) ## tmp <- createDataPartition(logBBB, ## p = .8, ## times = 100) ## rpartFit <- train(bbbDescr, logBBB, ## "rpart", ## tuneLength = 16, ## trControl = trainControl( ## method = "LGOCV", index = tmp)) ## ctreeFit <- train(bbbDescr, logBBB, ## "ctree", ## trControl = trainControl( ## method = "LGOCV", index = tmp)) ## earthFit <- train(bbbDescr, logBBB, ## "earth", ## tuneLength = 20, ## trControl = trainControl( ## method = "LGOCV", index = tmp)) ## or load pre-calculated results using: ## load(url("http://caret.r-forge.r-project.org/exampleModels.RData")) ## resamps <- resamples(list(CART = rpartFit, ## CondInfTree = ctreeFit, ## MARS = earthFit)) ## resamps ## summary(resamps) ```