validate_glm_initialization_input: Validate Inputs for Catalytic Generalized Linear Models...
In catalytic: Tools for Applying Catalytic Priors in Statistical Modeling

validate_glm_initialization_input

R Documentation

Validate Inputs for Catalytic Generalized Linear Models (GLMs) Initialization

Description

This function validates the input parameters required for initializing a catalytic Generalized Linear Model (GLM). It ensures the appropriate structure and compatibility of the formula, family, data, and additional parameters before proceeding with further modeling.

Usage

validate_glm_initialization_input(
  formula,
  family,
  data,
  syn_size,
  custom_variance,
  gaussian_known_variance,
  x_degree
)

Arguments

`formula`	A formula object specifying the `stats::glm` model to be fitted. It must not contain random effects or survival terms.
`family`	A character or family object specifying the error distribution and link function. Valid values are "binomial" and "gaussian".
`data`	A `data.frame` containing the data to be used in the GLM.
`syn_size`	A positive integer specifying the sample size used for the synthetic data.
`custom_variance`	A positive numeric value for the custom variance used in the model (only applicable for Gaussian family).
`gaussian_known_variance`	A logical indicating whether the variance is known for the Gaussian family.
`x_degree`	A numeric vector specifying the degree of the predictors. Its length should match the number of predictors (excluding the response variable).

Details

This function performs the following checks:

Ensures that syn_size, custom_variance, and x_degree are positive values.
Verifies that the provided formula is suitable for GLMs, ensuring no random effects or survival terms.
Checks that the provided data is a data.frame.
Confirms that the formula does not contain too many terms relative to the number of columns in data.
Ensures that the family is either "binomial" or "gaussian".
Validates that x_degree has the correct length relative to the number of predictors in data.
Warns if syn_size is too small relative to the number of columns in data.
Issues warnings if custom_variance or gaussian_known_variance are used with incompatible families. If any of these conditions are not met, the function raises an error or warning to guide the user.