cv.lr: Cross-Validation of Logistic Regression Model

View source: R/LogisticRegression.R

cv.lrR Documentation

Cross-Validation of Logistic Regression Model

Description

Implementation of cross-validation for a lr object, calculation of error across a number of subsets of the inputted data set.

Usage

cv.lr(
  lrfit,
  metric = "mse",
  leave_out = nrow(lrfit$data)/10,
  verbose = TRUE,
  seed = 1
)

Arguments

lrfit

an object of class "lr", the output to lr

metric

which metric to calculate, one of "mse", "auc" or "both". See 'Details'.

leave_out

number of points to leave out for cross-validation.

verbose

logical; whether to print information about number of iterations completed.

seed

optional; number to be passed to set.seed before shuffling the data set

Details

k-fold cross-validation, where k is the input to the leave_out argument. This can be used to judge the out-of-sample predictive power of the model by subsetting the original data set into two partitions; fitting the model for the (usually larger) one, and testing the predictions of that model on the (usually smaller) partition. The position of the k points separated from the data set are selected uniformly at random.

The error metrics available are that of mean squared error, AUC, or log score; selected by the metric argument being one of "mse", "auc", "log" or "all". See roc.lr for details on AUC. If metric is "all", then a vector will be output containing all three metrics.

Note that the output from metric = "auc" has non-deterministic elements due to the shuffling of the data set. To mitigate this, include a number to the seed argument.

Value

error value or vector consisting of the average of the chosen metric


dannyjameswilliams/danielR documentation built on Aug. 20, 2023, 3:25 a.m.