View source: R/cat_glm_initialization.R
cat_glm_initialization | R Documentation |
This function prepares and initializes a catalytic Generalized Linear Models (GLMs) by processing input data, extracting necessary variables, generating synthetic datasets, and fitting a model.
cat_glm_initialization(
formula,
family = "gaussian",
data,
syn_size = NULL,
custom_variance = NULL,
gaussian_known_variance = FALSE,
x_degree = NULL,
resample_only = FALSE,
na_replace = stats::na.omit
)
formula |
A formula specifying the GLMs. Should include response and predictor variables. |
family |
The type of GLM family. Defaults to Gaussian. |
data |
A data frame containing the data for modeling. |
syn_size |
An integer specifying the size of the synthetic dataset to be generated. Default is four times the number of predictor columns. |
custom_variance |
A custom variance value to be applied if using a Gaussian model. Defaults to |
gaussian_known_variance |
A logical value indicating whether the data variance is known. Defaults to |
x_degree |
A numeric vector indicating the degree for polynomial expansion of predictors. Default is 1 for each predictor. |
resample_only |
A logical indicating whether to perform resampling only. Default is FALSE. |
na_replace |
A function to handle NA values in the data. Default is |
A list containing the values of all the input arguments and the following components:
Function Information
function_name
: The name of the function, "cat_glm_initialization".
y_col_name
: The name of the response variable in the dataset.
simple_model
: An object of class stats::glm
, representing the fitted model for generating synthetic response from the original data.
Observation Data Information
obs_size
: Number of observations in the original dataset.
obs_data
: Data frame of standardized observation data.
obs_x
: Predictor variables for observed data.
obs_y
: Response variable for observed data.
Synthetic Data Information
syn_size
: Number of synthetic observations generated.
syn_data
: Data frame of synthetic predictor and response variables.
syn_x
: Synthetic predictor variables.
syn_y
: Synthetic response variable.
syn_x_resample_inform
: Information about resampling methods for synthetic predictors:
Coordinate: Preserves the original data values as reference coordinates during processing.
Deskewing: Adjusts the data distribution to reduce skewness and enhance symmetry.
Smoothing: Reduces noise in the data to stabilize the dataset and prevent overfitting.
Flattening: Creates a more uniform distribution by modifying low-frequency categories in categorical variables.
Symmetrizing: Balances the data around its mean to improve statistical properties for model fitting.
Whole Data Information
size
: Total number of combined original and synthetic observations.
data
: Data frame combining original and synthetic datasets.
x
: Combined predictor variables from original and synthetic data.
y
: Combined response variable from original and synthetic data.
gaussian_data <- data.frame(
X1 = stats::rnorm(10),
X2 = stats::rnorm(10),
Y = stats::rnorm(10)
)
cat_init <- cat_glm_initialization(
formula = Y ~ 1, # formula for simple model
data = gaussian_data,
syn_size = 100, # Synthetic data size
custom_variance = NULL, # User customized variance value
gaussian_known_variance = TRUE, # Indicating whether the data variance is known
x_degree = c(1, 1), # Degrees for polynomial expansion of predictors
resample_only = FALSE, # Whether to perform resampling only
na_replace = stats::na.omit # How to handle NA values in data
)
cat_init
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.