validate_glm_input: Validate Inputs for Catalytic Generalized Linear Models...

validate_glm_inputR Documentation

Validate Inputs for Catalytic Generalized Linear Models (GLMs)

Description

This function validates the input parameters for initializing a catalytic Generalized Linear Models (GLMs). It ensures that the provided model formula, family, and additional parameters are suitable for further analysis. The function performs various checks on the input values to confirm they meet expected criteria.

Usage

validate_glm_input(
  formula,
  cat_init,
  tau = NULL,
  tau_seq = NULL,
  tau_0 = NULL,
  parametric_bootstrap_iteration_times = NULL,
  cross_validation_fold_num = NULL,
  risk_estimate_method = NULL,
  discrepancy_method = NULL,
  binomial_joint_theta = FALSE,
  binomial_joint_alpha = FALSE,
  binomial_tau_lower = NULL,
  tau_alpha = NULL,
  tau_gamma = NULL,
  gibbs_iter = NULL,
  gibbs_warmup = NULL,
  coefs_iter = NULL,
  gaussian_variance_alpha = NULL,
  gaussian_variance_beta = NULL
)

Arguments

formula

A formula object specifying the GLM to be fitted. The left-hand side of the formula should at least contains the response variable.

cat_init

An object of class cat_initialization generated by cat_glm_initialization. It contains model initialization details, such as the response variable name and the GLM family.

tau

A positive numeric value for the tau parameter in the model. It represents a regularization or scaling factor and must be greater than zero.

tau_seq

A numeric vector specifying a sequence of tau values. This is used for parameter tuning and must contain positive values.

tau_0

A positive numeric value for the initial tau parameter, which must be greater than zero.

parametric_bootstrap_iteration_times

An integer specifying the number of iterations for the parametric bootstrap method. It must be greater than zero.

cross_validation_fold_num

An integer for the number of folds in cross-validation. It must be greater than 1 and less than or equal to the number of observations.

risk_estimate_method

A character string specifying the method for estimating risk, such as "parametric_bootstrap" or other options, depending on the family of the GLM.

discrepancy_method

A character string specifying the method for calculating discrepancy. The valid options depend on the GLM family and risk estimation method.

binomial_joint_theta

Logical; if TRUE, uses joint theta (theta = 1/tau) in Binomial models.

binomial_joint_alpha

Logical; if TRUE, uses joint alpha (adaptive tau_alpha) in Binomial models.

binomial_tau_lower

A positive numeric value specifying the lower bound for tau in binomial GLMs. It must be greater than zero.

tau_alpha

A positive numeric value for the tau alpha parameter.

tau_gamma

A positive numeric value for the tau gamma parameter.

gibbs_iter

An integer for the number of Gibbs iterations in the sampling process. It must be greater than zero.

gibbs_warmup

An integer for the number of warm-up iterations in the Gibbs sampling. It must be positive and less than the total number of iterations.

coefs_iter

An integer specifying the number of iterations for the coefficient update in the Gibbs sampling. It must be positive.

gaussian_variance_alpha

The shape parameter for the inverse-gamma prior on variance if the variance is unknown in Gaussian models. It must be positive.

gaussian_variance_beta

The scale parameter for the inverse-gamma prior on variance if the variance is unknown in Gaussian models. It must be positive.

Details

This function performs several checks to ensure the validity of the input parameters:

  • Ensures that tau, tau_0, parametric_bootstrap_iteration_times, binomial_tau_lower, tau_alpha, tau_gamma, gibbs_iter, gibbs_warmup, and coefs_iter are positive values.

  • Verifies that cat_init is an object generated by cat_glm_initialization.

  • Checks that the formula response name matches the response name used in the cat_init object.

  • Verifies that risk_estimate_method and discrepancy_method are compatible with the GLM family and that no invalid combinations are used.

  • Warns if the dataset size is too large for the specified risk estimation method. If any of these conditions are not met, the function raises an error or warning to guide the user.

Value

Returns nothing if all checks pass; otherwise, raises an error or warning.


catalytic documentation built on April 4, 2025, 5:51 a.m.