cv_pred_error: Compare models with k-fold cross-validation
In dtkaplan/gghelper: Functions for Teaching Statistical Modeling

Compare models with k-fold cross-validation

1 2	cv_pred_error(..., k = 10, ntrials = 5, output = c("mse", "likelihood", "error_rate", "class"))

`...`	one or more models on which to perform the cross-validation
`k`	the k in k-fold. cross-validation will use k-1/k of the data for training.
`ntrials`	how many random partitions to make. Each partition will be one case in the output of the function
`output`	The kind of output to produce from each cross-validation. See details.

The purpose of cross-validation is to provide "new" data on which to test a model's performance. In k-fold cross-validation, the data set used to train the model is broken into new training and testing data. This is accomplished simply by using most of the data for training while reserving the remaining data for evaluating the model: testing. Rather than training a single model, k models are trained, each with its own particular testing set. The testing sets in the k models are arranged to cover the whole of the data set. On each of the k testing sets, a performance output is calculated. Which output is most appropriate depends on the kind of model: regression model or classifier. The most basic measure is the mean square error: the difference between the actual response variable in the testing data and the output of the model when presented with inputs from the testing data. This is appropriate in many regression models.

For classification models, two different outputs are appropriate. The first is the error rate: the frequency with which the classifier produces an incorrect output when presented with inputs from the testing data. This is a rather course measure. A more graded measure is the likelihood: the probability of the response values from the test data given the model. (The "class" method is exactly the same as "error rate", but provided for compatibility purposes with other software under development.)

dtkaplan/gghelper documentation built on May 15, 2019, 5 p.m.