mod_cv: Compare models with k-fold cross validation

Description Usage Arguments Details

Description

Compare models with k-fold cross validation

Usage

1
2
mod_cv(..., k = 10, ntrials = 5, error_type = c("default", "mse",
  "sse", "mad", "LL", "mLL", "dev", "class_error"), blockwise = FALSE)

Arguments

...

one or more models on which to perform the cross validation

k

the k in k-fold. cross-validation will use k-1/k of the data for training.

ntrials

how many random partitions to make. Each partition will be one case in the output of the function

error_type

The kind of output to produce from each cross-validation. See mod_error for details.

blockwise

When TRUE carries out the trials in a blockwise mode, suitable for time series.

Details

The purpose of cross-validation is to provide "new" data on which to test a model's performance. In k-fold cross-validation, the data set used to train the model is broken into new training and testing data. This is accomplished simply by using most of the data for training while reserving the remaining data for evaluating the model: testing. Rather than training a single model, k models are trained, each with its own particular testing set. The testing sets in the k models are arranged to cover the whole of the data set. On each of the k testing sets, a performance output is calculated. Which output is most appropriate depends on the kind of model: regression model or classifier. The most basic measure is the mean square error: the difference between the actual response variable in the testing data and the output of the model when presented with inputs from the testing data. This is appropriate in many regression models.

The blockwise mode is intended for time-series like data, where successive rows may be correlated. Rather than picking each k block as a random sample of rows, adjacent rows will be used for model evaluation.


ProjectMOSAIC/mosaicModel documentation built on May 13, 2019, 1:35 a.m.