View source: R/cat_cox_initialization.R
cat_cox_initialization | R Documentation |
This function prepares and initializes a catalytic Cox proportional hazards model by processing input data, extracting necessary variables, generating synthetic datasets, and fitting a model.
cat_cox_initialization(
formula,
data,
syn_size = NULL,
hazard_constant = NULL,
entry_points = NULL,
x_degree = NULL,
resample_only = FALSE,
na_replace = stats::na.omit
)
formula |
A formula specifying the Cox model. Should include response and predictor variables. |
data |
A data frame containing the data for modeling. |
syn_size |
An integer specifying the size of the synthetic dataset to be generated. Default is four times the number of predictor columns. |
hazard_constant |
A constant hazard rate for generating synthetic time data if not using a fitted Cox model. Default is NULL and will calculate in function. |
entry_points |
A numeric vector for entry points of each observation. Default is NULL. |
x_degree |
A numeric vector indicating the degree for polynomial expansion of predictors. Default is 1 for each predictor. |
resample_only |
A logical indicating whether to perform resampling only. Default is FALSE. |
na_replace |
A function to handle NA values in the data. Default is |
A list containing the values of all the input arguments and the following components:
Function Information:
function_name
: The name of the function, "cat_cox_initialization".
time_col_name
: The name of the time variable in the dataset.
status_col_name
: The name of the status variable (event indicator) in the dataset.
simple_model
: If the formula has no predictors, a constant hazard rate model is used; otherwise, a fitted Cox model object.
Observation Data Information:
obs_size
: Number of observations in the original dataset.
obs_data
: Data frame of standardized observation data.
obs_x
: Predictor variables for observed data.
obs_time
: Observed survival times.
obs_status
: Event indicator for observed data.
Synthetic Data Information:
syn_size
: Number of synthetic observations generated.
syn_data
: Data frame of synthetic predictor and response variables.
syn_x
: Synthetic predictor variables.
syn_time
: Synthetic survival times.
syn_status
: Event indicator for synthetic data (defaults to 1).
syn_x_resample_inform
: Information about resampling methods for synthetic predictors:
Coordinate: Preserves the original data values as reference coordinates during processing.
Deskewing: Adjusts the data distribution to reduce skewness and enhance symmetry.
Smoothing: Reduces noise in the data to stabilize the dataset and prevent overfitting.
Flattening: Creates a more uniform distribution by modifying low-frequency categories in categorical variables.
Symmetrizing: Balances the data around its mean to improve statistical properties for model fitting.
Whole Data Information:
size
: Total number of combined original and synthetic observations.
data
: Data frame combining original and synthetic datasets.
x
: Combined predictor variables from original and synthetic data.
time
: Combined survival times from original and synthetic data.
status
: Combined event indicators from original and synthetic data.
library(survival)
data("cancer")
cancer$status[cancer$status == 1] <- 0
cancer$status[cancer$status == 2] <- 1
cat_init <- cat_cox_initialization(
formula = Surv(time, status) ~ 1, # formula for simple model
data = cancer,
syn_size = 100, # Synthetic data size
hazard_constant = NULL, # Hazard rate value
entry_points = rep(0, nrow(cancer)), # Entry points of each observation
x_degree = rep(1, ncol(cancer) - 2), # Degrees for polynomial expansion of predictors
resample_only = FALSE, # Whether to perform resampling only
na_replace = stats::na.omit # How to handle NA values in data
)
cat_init
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.