cv.lr: Cross-Validation of Logistic Regression Model
In dannyjameswilliams/danielR: Collection of Danny's R Code

cv.lr

R Documentation

Cross-Validation of Logistic Regression Model

Description

Implementation of cross-validation for a lr object, calculation of error across a number of subsets of the inputted data set.

Usage

cv.lr(
  lrfit,
  metric = "mse",
  leave_out = nrow(lrfit$data)/10,
  verbose = TRUE,
  seed = 1
)

Arguments

`lrfit`	an object of class "`lr`", the output to `lr`
`metric`	which metric to calculate, one of "mse", "auc" or "both". See 'Details'.
`leave_out`	number of points to leave out for cross-validation.
`verbose`	logical; whether to print information about number of iterations completed.
`seed`	optional; number to be passed to `set.seed` before shuffling the data set

Details

k-fold cross-validation, where k is the input to the leave_out argument. This can be used to judge the out-of-sample predictive power of the model by subsetting the original data set into two partitions; fitting the model for the (usually larger) one, and testing the predictions of that model on the (usually smaller) partition. The position of the k points separated from the data set are selected uniformly at random.

The error metrics available are that of mean squared error, AUC, or log score; selected by the metric argument being one of "mse", "auc", "log" or "all". See roc.lr for details on AUC. If metric is "all", then a vector will be output containing all three metrics.

Note that the output from metric = "auc" has non-deterministic elements due to the shuffling of the data set. To mitigate this, include a number to the seed argument.