View source: R/cat_lmm_initialization.R
cat_lmm_initialization | R Documentation |
This function prepares and initializes a catalytic linear mixed model by processing input data, extracting necessary variables, generating synthetic datasets, and fitting a model. (Only consider one random effect variance)
cat_lmm_initialization(
formula,
data,
x_cols,
y_col,
z_cols,
group_col = NULL,
syn_size = NULL,
resample_by_group = FALSE,
resample_only = FALSE,
na_replace = mean
)
formula |
A formula specifying the model. Should include response and predictor variables. |
data |
A data frame containing the data for modeling. |
x_cols |
A character vector of column names for fixed effects (predictors). |
y_col |
A character string for the name of the response variable. |
z_cols |
A character vector of column names for random effects. |
group_col |
A character string for the grouping variable (optional). If not given (NULL), it is extracted from the formula. |
syn_size |
An integer specifying the size of the synthetic dataset to be generated, default is length(x_cols) * 4. |
resample_by_group |
A logical indicating whether to resample by group, default is FALSE. |
resample_only |
A logical indicating whether to perform resampling only, default is FALSE. |
na_replace |
A function to replace NA values in the data, default is mean. |
A list containing the values of all the input arguments and the following components:
Function Information:
function_name
: A character string representing the name of the function, "cat_lmm_initialization".
simple_model
: An object of class lme4::lmer
or stats::lm
, representing the fitted model for generating synthetic response from the original data.
Observation Data Information:
obs_size
: An integer representing the number of observations in the original dataset.
obs_data
: The original data used for fitting the model, returned as a data frame.
obs_x
: A data frame containing the standardized predictor variables from the original dataset.
obs_y
: A numeric vector of the standardized response variable from the original dataset.
obs_z
: A data frame containing the standardized random effect variables from the original dataset.
obs_group
: A numeric vector representing the grouping variable for the original observations.
Synthetic Data Information:
syn_size
: An integer representing the number of synthetic observations generated.
syn_data
: A data frame containing the synthetic dataset, combining synthetic predictor and response variables.
syn_x
: A data frame containing the synthetic predictor variables.
syn_y
: A numeric vector of the synthetic response variable values.
syn_z
: A data frame containing the synthetic random effect variables.
syn_group
: A numeric vector representing the grouping variable for the synthetic observations.
syn_x_resample_inform
: A data frame containing information about the resampling process for synthetic predictors:
Coordinate: Preserves the original data values as reference coordinates during processing.
Deskewing: Adjusts the data distribution to reduce skewness and enhance symmetry.
Smoothing: Reduces noise in the data to stabilize the dataset and prevent overfitting.
Flattening: Creates a more uniform distribution by modifying low-frequency categories in categorical variables.
Symmetrizing: Balances the data around its mean to improve statistical properties for model fitting.
syn_z_resample_inform
: A data frame containing information about the resampling process for synthetic random effects. The resampling methods are the same as those from syn_x_resample_inform
.
Whole Data Information:
size
: An integer representing the total size of the combined original and synthetic datasets.
data
: A combined data frame of the original and synthetic datasets.
x
: A combined data frame of the original and synthetic predictor variables.
y
: A combined numeric vector of the original and synthetic response variables.
z
: A combined data frame of the original and synthetic random effect variables.
group
: A combined numeric vector representing the grouping variable for both original and synthetic datasets.
data(mtcars)
cat_init <- cat_lmm_initialization(
formula = mpg ~ wt + (1 | cyl), # formula for simple model
data = mtcars,
x_cols = c("wt"), # Fixed effects
y_col = "mpg", # Response variable
z_cols = c("disp", "hp", "drat", "qsec", "vs", "am", "gear", "carb"), # Random effects
group_col = "cyl", # Grouping column
syn_size = 100, # Synthetic data size
resample_by_group = FALSE, # Resampling option
resample_only = FALSE, # Resampling method
na_replace = mean # NA replacement method
)
cat_init
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.