cat_glm_tune: Catalytic Generalized Linear Models (GLMs) Fitting Function...

View source: R/cat_glm_tune.R

cat_glm_tuneR Documentation

Catalytic Generalized Linear Models (GLMs) Fitting Function by Tuning tau from a Sequence of tau Values

Description

This function tunes a catalytic catalytic Generalized Linear Models (GLMs) by performing specified risk estimate method to estimate the optimal value of the tuning parameter tau. The resulting cat_glm_tune object encapsulates the fitted model, including estimated coefficients and family information, facilitating further analysis.

Usage

cat_glm_tune(
  formula,
  cat_init,
  risk_estimate_method = c("parametric_bootstrap", "cross_validation",
    "mallowian_estimate", "steinian_estimate"),
  discrepancy_method = c("mean_square_error", "mean_classification_error",
    "logistic_deviance"),
  tau_seq = NULL,
  tau_0 = NULL,
  parametric_bootstrap_iteration_times = 100,
  cross_validation_fold_num = 5
)

Arguments

formula

A formula specifying the GLMs. Should at least include response variables (e.g. ~ .).

cat_init

A list generated from cat_glm_initialization.

risk_estimate_method

Method for risk estimation, chosen from "parametric_bootstrap", "cross_validation", "mallows_estimate", "steinian_estimate". Depends on the size of the data if not provided.

discrepancy_method

Method for discrepancy calculation, chosen from "mean_square_error", "mean_classification_error", "logistic_deviance". Depends on the family if not provided.

tau_seq

Vector of numeric values for down-weighting synthetic data. Defaults to a sequence around one fourth of the number of predictors for gaussian and the number of predictors for binomial.

tau_0

Initial tau value used for discrepancy calculation in risk estimation. Defaults to one fourth of the number of predictors for binomial and 1 for gaussian.

parametric_bootstrap_iteration_times

Number of bootstrap iterations for "parametric_bootstrap" risk estimation. Defaults to 100.

cross_validation_fold_num

Number of folds for "cross_validation" risk estimation.. Defaults to 5.

Value

A list containing the values of all the arguments and the following components:

tau

Optimal tau value determined through tuning.

model

Fitted GLM model object with the optimal tau value.

coefficients

Estimated coefficients from the model fitted by the optimal tau value.

risk_estimate_list

Collected risk estimates for each tau.

Examples

gaussian_data <- data.frame(
  X1 = stats::rnorm(10),
  X2 = stats::rnorm(10),
  Y = stats::rnorm(10)
)

cat_init <- cat_glm_initialization(
  formula = Y ~ 1, # formula for simple model
  data = gaussian_data,
  syn_size = 100, # Synthetic data size
  custom_variance = NULL, # User customized variance value
  gaussian_known_variance = TRUE, # Indicating whether the data variance is known
  x_degree = c(1, 1), # Degrees for polynomial expansion of predictors
  resample_only = FALSE, # Whether to perform resampling only
  na_replace = stats::na.omit # How to handle NA values in data
)

cat_model <- cat_glm_tune(
  formula = ~.,
  cat_init = cat_init, # Only accept object generated from `cat_glm_initialization`
  risk_estimate_method = "parametric_bootstrap",
  discrepancy_method = "mean_square_error",
  tau_seq = c(1, 2), # Weight for synthetic data
  tau_0 = 2,
  parametric_bootstrap_iteration_times = 20, # Number of bootstrap iterations
  cross_validation_fold_num = 5 # Number of folds
)
cat_model

catalytic documentation built on April 4, 2025, 5:51 a.m.