cat_cox_initialization: Initialization for Catalytic Cox proportional hazards model...

View source: R/cat_cox_initialization.R

cat_cox_initializationR Documentation

Initialization for Catalytic Cox proportional hazards model (COX)

Description

This function prepares and initializes a catalytic Cox proportional hazards model by processing input data, extracting necessary variables, generating synthetic datasets, and fitting a model.

Usage

cat_cox_initialization(
  formula,
  data,
  syn_size = NULL,
  hazard_constant = NULL,
  entry_points = NULL,
  x_degree = NULL,
  resample_only = FALSE,
  na_replace = stats::na.omit
)

Arguments

formula

A formula specifying the Cox model. Should include response and predictor variables.

data

A data frame containing the data for modeling.

syn_size

An integer specifying the size of the synthetic dataset to be generated. Default is four times the number of predictor columns.

hazard_constant

A constant hazard rate for generating synthetic time data if not using a fitted Cox model. Default is NULL and will calculate in function.

entry_points

A numeric vector for entry points of each observation. Default is NULL.

x_degree

A numeric vector indicating the degree for polynomial expansion of predictors. Default is 1 for each predictor.

resample_only

A logical indicating whether to perform resampling only. Default is FALSE.

na_replace

A function to handle NA values in the data. Default is stats::na.omit.

Value

A list containing the values of all the input arguments and the following components:

  • Function Information:

    • function_name: The name of the function, "cat_cox_initialization".

    • time_col_name: The name of the time variable in the dataset.

    • status_col_name: The name of the status variable (event indicator) in the dataset.

    • simple_model: If the formula has no predictors, a constant hazard rate model is used; otherwise, a fitted Cox model object.

  • Observation Data Information:

    • obs_size: Number of observations in the original dataset.

    • obs_data: Data frame of standardized observation data.

    • obs_x: Predictor variables for observed data.

    • obs_time: Observed survival times.

    • obs_status: Event indicator for observed data.

  • Synthetic Data Information:

    • syn_size: Number of synthetic observations generated.

    • syn_data: Data frame of synthetic predictor and response variables.

    • syn_x: Synthetic predictor variables.

    • syn_time: Synthetic survival times.

    • syn_status: Event indicator for synthetic data (defaults to 1).

    • syn_x_resample_inform: Information about resampling methods for synthetic predictors:

      • Coordinate: Preserves the original data values as reference coordinates during processing.

      • Deskewing: Adjusts the data distribution to reduce skewness and enhance symmetry.

      • Smoothing: Reduces noise in the data to stabilize the dataset and prevent overfitting.

      • Flattening: Creates a more uniform distribution by modifying low-frequency categories in categorical variables.

      • Symmetrizing: Balances the data around its mean to improve statistical properties for model fitting.

  • Whole Data Information:

    • size: Total number of combined original and synthetic observations.

    • data: Data frame combining original and synthetic datasets.

    • x: Combined predictor variables from original and synthetic data.

    • time: Combined survival times from original and synthetic data.

    • status: Combined event indicators from original and synthetic data.

Examples

library(survival)
data("cancer")
cancer$status[cancer$status == 1] <- 0
cancer$status[cancer$status == 2] <- 1

cat_init <- cat_cox_initialization(
  formula = Surv(time, status) ~ 1, # formula for simple model
  data = cancer,
  syn_size = 100, # Synthetic data size
  hazard_constant = NULL, # Hazard rate value
  entry_points = rep(0, nrow(cancer)), # Entry points of each observation
  x_degree = rep(1, ncol(cancer) - 2), # Degrees for polynomial expansion of predictors
  resample_only = FALSE, # Whether to perform resampling only
  na_replace = stats::na.omit # How to handle NA values in data
)
cat_init

catalytic documentation built on April 4, 2025, 5:51 a.m.