cross_validation: Perform Cross-Validation for Model Estimation

cross_validationR Documentation

Perform Cross-Validation for Model Estimation

Description

This function performs cross-validation for estimating risk over a sequence of tuning parameters (tau_seq) by fitting a Generalized Linear Model (GLM) to the data. It evaluates model performance by splitting the dataset into multiple folds, training the model on a subset of the data, and testing it on the remaining portion.

Usage

cross_validation(
  formula,
  cat_init,
  tau_seq,
  discrepancy_method,
  cross_validation_fold_num,
  ...
)

Arguments

formula

A formula specifying the GLMs. Should at least include response variables.

cat_init

A list generated from cat_glm_initialization.

tau_seq

A sequence of tuning parameter values (tau) over which cross-validation will be performed. Each value of tau is used to weight the synthetic data during model fitting.

discrepancy_method

A function used to calculate the discrepancy (error) between model predictions and actual values.

cross_validation_fold_num

The number of folds to use in cross-validation. The dataset will be randomly split into this number of subsets, and the model will be trained and tested on different combinations of these subsets.

...

Other arguments passed to other internal functions.

Details

  1. Randomization of the Data: The data is randomly shuffled into cross_validation_fold_num subsets to ensure that the model is evaluated across different splits of the dataset.

  2. Model Training and Prediction: For each fold, a training set is used to fit a GLM with varying values of tau (from tau_seq), and the model is evaluated on a test set. The training data consists of both the observed and synthetic data, with synthetic data weighted by tau.

  3. Risk Estimation: After fitting the model, the discrepancy_method is used to calculate the prediction error for each combination of fold and tau. These errors are accumulated for each tau.

  4. Average Risk Estimate: After completing all folds, the accumulated prediction errors are averaged over the number of folds to provide a final risk estimate for each value of tau.

Value

A numeric vector of averaged risk estimates, one for each value of tau in tau_seq.


catalytic documentation built on April 4, 2025, 5:51 a.m.