cat_cox_initialization: Initialization for Catalytic Cox proportional hazards model...
In catalytic: Tools for Applying Catalytic Priors in Statistical Modeling

cat_cox_initialization

R Documentation

Initialization for Catalytic Cox proportional hazards model (COX)

Description

This function prepares and initializes a catalytic Cox proportional hazards model by processing input data, extracting necessary variables, generating synthetic datasets, and fitting a model.

Usage

cat_cox_initialization(
  formula,
  data,
  syn_size = NULL,
  hazard_constant = NULL,
  entry_points = NULL,
  x_degree = NULL,
  resample_only = FALSE,
  na_replace = stats::na.omit
)

Arguments

`formula`	A formula specifying the Cox model. Should include response and predictor variables.
`data`	A data frame containing the data for modeling.
`syn_size`	An integer specifying the size of the synthetic dataset to be generated. Default is four times the number of predictor columns.
`hazard_constant`	A constant hazard rate for generating synthetic time data if not using a fitted Cox model. Default is NULL and will calculate in function.
`entry_points`	A numeric vector for entry points of each observation. Default is NULL.
`x_degree`	A numeric vector indicating the degree for polynomial expansion of predictors. Default is 1 for each predictor.
`resample_only`	A logical indicating whether to perform resampling only. Default is FALSE.
`na_replace`	A function to handle NA values in the data. Default is `stats::na.omit`.

Value

A list containing the values of all the input arguments and the following components:

Function Information:
- function_name: The name of the function, "cat_cox_initialization".
- time_col_name: The name of the time variable in the dataset.
- status_col_name: The name of the status variable (event indicator) in the dataset.
- simple_model: If the formula has no predictors, a constant hazard rate model is used; otherwise, a fitted Cox model object.
Observation Data Information:
- obs_size: Number of observations in the original dataset.
- obs_data: Data frame of standardized observation data.
- obs_x: Predictor variables for observed data.
- obs_time: Observed survival times.
- obs_status: Event indicator for observed data.
Synthetic Data Information:
- syn_size: Number of synthetic observations generated.
- syn_data: Data frame of synthetic predictor and response variables.
- syn_x: Synthetic predictor variables.
- syn_time: Synthetic survival times.
- syn_status: Event indicator for synthetic data (defaults to 1).
- syn_x_resample_inform: Information about resampling methods for synthetic predictors:
  - Coordinate: Preserves the original data values as reference coordinates during processing.
  - Deskewing: Adjusts the data distribution to reduce skewness and enhance symmetry.
  - Smoothing: Reduces noise in the data to stabilize the dataset and prevent overfitting.
  - Flattening: Creates a more uniform distribution by modifying low-frequency categories in categorical variables.
  - Symmetrizing: Balances the data around its mean to improve statistical properties for model fitting.
Whole Data Information:
- size: Total number of combined original and synthetic observations.
- data: Data frame combining original and synthetic datasets.
- x: Combined predictor variables from original and synthetic data.
- time: Combined survival times from original and synthetic data.
- status: Combined event indicators from original and synthetic data.

Examples

library(survival)
data("cancer")
cancer$status[cancer$status == 1] <- 0
cancer$status[cancer$status == 2] <- 1

cat_init <- cat_cox_initialization(
  formula = Surv(time, status) ~ 1, # formula for simple model
  data = cancer,
  syn_size = 100, # Synthetic data size
  hazard_constant = NULL, # Hazard rate value
  entry_points = rep(0, nrow(cancer)), # Entry points of each observation
  x_degree = rep(1, ncol(cancer) - 2), # Degrees for polynomial expansion of predictors
  resample_only = FALSE, # Whether to perform resampling only
  na_replace = stats::na.omit # How to handle NA values in data
)
cat_init

catalytic documentation built on April 4, 2025, 5:51 a.m.