validate_glm_input: Validate Inputs for Catalytic Generalized Linear Models...
In catalytic: Tools for Applying Catalytic Priors in Statistical Modeling

validate_glm_input

R Documentation

Validate Inputs for Catalytic Generalized Linear Models (GLMs)

Description

This function validates the input parameters for initializing a catalytic Generalized Linear Models (GLMs). It ensures that the provided model formula, family, and additional parameters are suitable for further analysis. The function performs various checks on the input values to confirm they meet expected criteria.

Usage

validate_glm_input(
  formula,
  cat_init,
  tau = NULL,
  tau_seq = NULL,
  tau_0 = NULL,
  parametric_bootstrap_iteration_times = NULL,
  cross_validation_fold_num = NULL,
  risk_estimate_method = NULL,
  discrepancy_method = NULL,
  binomial_joint_theta = FALSE,
  binomial_joint_alpha = FALSE,
  binomial_tau_lower = NULL,
  tau_alpha = NULL,
  tau_gamma = NULL,
  gibbs_iter = NULL,
  gibbs_warmup = NULL,
  coefs_iter = NULL,
  gaussian_variance_alpha = NULL,
  gaussian_variance_beta = NULL
)

Arguments

`formula`	A formula object specifying the GLM to be fitted. The left-hand side of the formula should at least contains the response variable.
`cat_init`	An object of class `cat_initialization` generated by `cat_glm_initialization`. It contains model initialization details, such as the response variable name and the GLM family.
`tau`	A positive numeric value for the tau parameter in the model. It represents a regularization or scaling factor and must be greater than zero.
`tau_seq`	A numeric vector specifying a sequence of tau values. This is used for parameter tuning and must contain positive values.
`tau_0`	A positive numeric value for the initial tau parameter, which must be greater than zero.
`parametric_bootstrap_iteration_times`	An integer specifying the number of iterations for the parametric bootstrap method. It must be greater than zero.
`cross_validation_fold_num`	An integer for the number of folds in cross-validation. It must be greater than 1 and less than or equal to the number of observations.
`risk_estimate_method`	A character string specifying the method for estimating risk, such as "parametric_bootstrap" or other options, depending on the family of the GLM.
`discrepancy_method`	A character string specifying the method for calculating discrepancy. The valid options depend on the GLM family and risk estimation method.
`binomial_joint_theta`	Logical; if TRUE, uses joint theta (theta = 1/tau) in Binomial models.
`binomial_joint_alpha`	Logical; if TRUE, uses joint alpha (adaptive tau_alpha) in Binomial models.
`binomial_tau_lower`	A positive numeric value specifying the lower bound for tau in binomial GLMs. It must be greater than zero.
`tau_alpha`	A positive numeric value for the tau alpha parameter.
`tau_gamma`	A positive numeric value for the tau gamma parameter.
`gibbs_iter`	An integer for the number of Gibbs iterations in the sampling process. It must be greater than zero.
`gibbs_warmup`	An integer for the number of warm-up iterations in the Gibbs sampling. It must be positive and less than the total number of iterations.
`coefs_iter`	An integer specifying the number of iterations for the coefficient update in the Gibbs sampling. It must be positive.
`gaussian_variance_alpha`	The shape parameter for the inverse-gamma prior on variance if the variance is unknown in Gaussian models. It must be positive.
`gaussian_variance_beta`	The scale parameter for the inverse-gamma prior on variance if the variance is unknown in Gaussian models. It must be positive.

Details

This function performs several checks to ensure the validity of the input parameters:

Ensures that tau, tau_0, parametric_bootstrap_iteration_times, binomial_tau_lower, tau_alpha, tau_gamma, gibbs_iter, gibbs_warmup, and coefs_iter are positive values.
Verifies that cat_init is an object generated by cat_glm_initialization.
Checks that the formula response name matches the response name used in the cat_init object.
Verifies that risk_estimate_method and discrepancy_method are compatible with the GLM family and that no invalid combinations are used.
Warns if the dataset size is too large for the specified risk estimation method. If any of these conditions are not met, the function raises an error or warning to guide the user.