#' Compare models with k-fold cross validation
#'
#' @param ... one or more models on which to perform the cross validation
#' @param k the k in k-fold. cross-validation will use k-1/k of the data for training.
#' @param ntrials how many random partitions to make. Each partition will be one case in the
#' output of the function
#' @param error_type The kind of output to produce from each cross-validation. See \code{\link{mod_error}} for details.
#' @param blockwise When TRUE carries out the trials in a blockwise mode, suitable for time series.
#'
#' @details The purpose of cross-validation is to provide "new" data on which to test a model's
#' performance. In k-fold cross-validation, the data set used to train the model is broken into
#' new training and testing data. This is accomplished simply by using most of the data for training while
#' reserving the remaining data for evaluating the model: testing. Rather than training a single model, k models
#' are trained, each with its own particular testing set. The testing sets in the k models are arranged to cover the
#' whole of the data set. On each of the k testing sets, a performance output is calculated. Which output is
#' most appropriate depends on the kind of model: regression model or classifier. The most basic measure is the mean square error: the
#' difference between the actual response variable in the testing data and the output of the model
#' when presented with inputs from the testing data. This is appropriate in many regression models.
#'
#' The blockwise mode is intended for time-series like data, where successive rows may be correlated. Rather than
#' picking each k block as a random sample of rows, adjacent rows will be used for model evaluation.
#'
#'
#' @export
mod_cv <- function(..., k = 10, ntrials = 5,
error_type = c("default", "mse", "sse", "mad", "LL", "mLL", "dev", "class_error"),
blockwise = FALSE) {
error_type = match.arg(error_type)
# Get just the names of the models
full_names <- as.character(lapply(lazyeval::lazy_dots(...), FUN = function(x) x$expr))
# Now for the models themselves
models <- list(...)
# model can be a list. If so, repeat over all the models.
result = NULL
for (counter in 1:length(models)) {
this_mod <- models[[counter]]
truth <- response_values(this_mod)
if (blockwise) {
pred_error_results <- kfold_block(this_mod, k = k, type = error_type)
error_type <- names(pred_error_results)[1] # get the actual name, not "default"
} else {
pred_error_results <- numeric(ntrials)
for (this_trial in 1:ntrials) {
# get the model outputs for each test group against
# the rest of the data
tmp <- kfold_trial(this_mod, k = k, type = error_type)
if (this_trial == 1) error_type <- names(tmp)[1]
pred_error_results[this_trial] <- tmp
}
}
from_this_mod <- data.frame(pred_error_results, model = full_names[counter],
stringsAsFactors = FALSE)
names(from_this_mod)[1] <- error_type
result <- rbind(result, from_this_mod)
}
result
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.