cross_validation: Perform Cross-Validation for Model Estimation
In catalytic: Tools for Applying Catalytic Priors in Statistical Modeling

cross_validation

R Documentation

Perform Cross-Validation for Model Estimation

Description

This function performs cross-validation for estimating risk over a sequence of tuning parameters (tau_seq) by fitting a Generalized Linear Model (GLM) to the data. It evaluates model performance by splitting the dataset into multiple folds, training the model on a subset of the data, and testing it on the remaining portion.

Usage

cross_validation(
  formula,
  cat_init,
  tau_seq,
  discrepancy_method,
  cross_validation_fold_num,
  ...
)

Arguments

`formula`	A formula specifying the GLMs. Should at least include response variables.
`cat_init`	A list generated from `cat_glm_initialization`.
`tau_seq`	A sequence of tuning parameter values (`tau`) over which cross-validation will be performed. Each value of `tau` is used to weight the synthetic data during model fitting.
`discrepancy_method`	A function used to calculate the discrepancy (error) between model predictions and actual values.
`cross_validation_fold_num`	The number of folds to use in cross-validation. The dataset will be randomly split into this number of subsets, and the model will be trained and tested on different combinations of these subsets.
`...`	Other arguments passed to other internal functions.

Details

Randomization of the Data: The data is randomly shuffled into cross_validation_fold_num subsets to ensure that the model is evaluated across different splits of the dataset.
Model Training and Prediction: For each fold, a training set is used to fit a GLM with varying values of tau (from tau_seq), and the model is evaluated on a test set. The training data consists of both the observed and synthetic data, with synthetic data weighted by tau.
Risk Estimation: After fitting the model, the discrepancy_method is used to calculate the prediction error for each combination of fold and tau. These errors are accumulated for each tau.
Average Risk Estimate: After completing all folds, the accumulated prediction errors are averaged over the number of folds to provide a final risk estimate for each value of tau.