outercv | R Documentation |

This is a convenience function designed to use a single loop of cross-validation to quickly evaluate performance of specific models (random forest, naive Bayes, lm, glm) with fixed hyperparameters and no tuning. If tuning of parameters on data is required, full nested CV with inner CV is needed to tune model hyperparameters (see nestcv.train).

outercv(y, ...) ## Default S3 method: outercv( y, x, model, filterFUN = NULL, filter_options = NULL, weights = NULL, balance = NULL, balance_options = NULL, outer_method = c("cv", "LOOCV"), n_outer_folds = 10, outer_folds = NULL, cv.cores = 1, predict_type = "prob", na.option = "pass", returnList = FALSE, ... ) ## S3 method for class 'formula' outercv( formula, data, model, outer_method = c("cv", "LOOCV"), n_outer_folds = 10, outer_folds = NULL, cv.cores = 1, predict_type = "prob", ..., na.action = na.fail )

`y` |
Response vector |

`...` |
Optional arguments passed to the function specified by |

`x` |
Matrix or dataframe of predictors |

`model` |
Model function to be fitted. |

`filterFUN` |
Filter function, e.g. ttest_filter or relieff_filter.
Any function can be provided and is passed |

`filter_options` |
List of additional arguments passed to the filter
function specified by |

`weights` |
Weights applied to each sample for models which can use
weights. Note |

`balance` |
Specifies method for dealing with imbalanced class data.
Current options are |

`balance_options` |
List of additional arguments passed to the balancing function |

`outer_method` |
String of either |

`n_outer_folds` |
Number of outer CV folds |

`outer_folds` |
Optional list containing indices of test folds for outer
CV. If supplied, |

`cv.cores` |
Number of cores for parallel processing of the outer loops.
NOTE: this uses |

`predict_type` |
Only used with binary classification. Calculation of ROC
AUC requires predicted class probabilities from fitted models. Most model
functions use syntax of the form |

`na.option` |
Character value specifying how |

`returnList` |
Logical whether to return list of results after main outer CV loop without concatenating results. Useful for debugging. |

`formula` |
A formula describing the model to be fitted |

`data` |
A matrix or data frame containing variables in the model. |

`na.action` |
Formula S3 method only: a function to specify the action to
be taken if NAs are found. The default action is for the procedure to fail.
An alternative is |

Some predictive model functions do not have an x & y interface. If the
function specified by `model`

requires a formula, `x`

& `y`

will be merged
into a dataframe with `model()`

called with a formula equivalent to
`y ~ .`

.

The S3 formula method for `outercv`

is not really recommended with large
data sets - it is envisaged to be primarily used to compare
performance of more basic models e.g. `lm()`

specified by formulae for
example incorporating interactions. NOTE: filtering is not available if
`outercv`

is called with a formula - use the `x-y`

interface instead.

An alternative method of tuning a single model with fixed parameters
is to use nestcv.train with `tuneGrid`

set as a single row of a
data.frame. The parameters which are needed for a specific model can be
identified using `caret::modelLookup()`

.

Case weights can be passed to model function which accept these, however
`outercv`

assumes that these are passed to the model via an argument named
`weights`

.

Note that in the case of `model = lm`

, although additional arguments e.g.
`subset`

, `weights`

, `offset`

are passed into the model function via
`"..."`

the scoping is known to go awry. Avoid using these arguments with
`model = lm`

.

`NA`

handling differs between the default S3 method and the formula S3
method. The `na.option`

argument takes a character string, while the more
typical `na.action`

argument takes a function.

An object with S3 class "outercv"

`call` |
the matched call |

`output` |
Predictions on the left-out outer folds |

`outer_result` |
List object of results from each outer fold containing predictions on left-out outer folds, model result and number of filtered predictors at each fold. |

`dimx` |
vector of number of observations and number of predictors |

`outer_folds` |
List of indices of outer test folds |

`final_fit` |
Final fitted model on whole data |

`final_vars` |
Column names of filtered predictors entering final model |

`summary_vars` |
Summary statistics of filtered predictors |

`roc` |
ROC AUC for binary classification where available. |

`summary` |
Overall performance summary. Accuracy and balanced accuracy for classification. ROC AUC for binary classification. RMSE for regression. |

## Classification example ## sigmoid function sigmoid <- function(x) {1 / (1 + exp(-x))} # load iris dataset and simulate a binary outcome data(iris) dt <- iris[, 1:4] colnames(dt) <- c("marker1", "marker2", "marker3", "marker4") dt <- as.data.frame(apply(dt, 2, scale)) x <- dt y2 <- sigmoid(0.5 * dt$marker1 + 2 * dt$marker2) > runif(nrow(dt)) y2 <- factor(y2) ## Random forest library(randomForest) cvfit <- outercv(y2, x, randomForest) summary(cvfit) plot(cvfit$roc) ## Mixture discriminant analysis (MDA) if (requireNamespace("mda", quietly = TRUE)) { library(mda) cvfit <- outercv(y2, x, mda, predict_type = "posterior") summary(cvfit) } ## Example with continuous outcome y <- -3 + 0.5 * dt$marker1 + 2 * dt$marker2 + rnorm(nrow(dt), 0, 2) dt$outcome <- y ## simple linear model - formula interface cvfit <- outercv(outcome ~ ., data = dt, model = lm) summary(cvfit) ## random forest for regression cvfit <- outercv(y, x, randomForest) summary(cvfit) ## example with lm_filter() to reduce input predictors cvfit <- outercv(y, x, randomForest, filterFUN = lm_filter, filter_options = list(nfilter = 2)) summary(cvfit)

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.