load_and_prepare_data_pro: Load and Prepare Data for Prognostic Models

View source: R/prognosis.R

load_and_prepare_data_proR Documentation

Load and Prepare Data for Prognostic Models

Description

Loads a CSV file containing patient data, extracts features, outcome, and time columns, and prepares them into a format suitable for survival analysis models. Handles basic data cleaning like NA removal and column type conversion.

Usage

load_and_prepare_data_pro(
  data_path,
  outcome_col_name,
  time_col_name,
  time_unit = c("day", "month", "year")
)

Arguments

data_path

A character string, the file path to the input CSV data. The first column is assumed to be a sample ID.

outcome_col_name

A character string, the name of the column containing event status (0 for censored, 1 for event).

time_col_name

A character string, the name of the column containing event or censoring time.

time_unit

A character string, the unit of time in time_col_name. Can be "day", "month", or "year". Times will be converted to days internally.

Value

A list containing:

  • X: A data frame of features (all columns except ID, outcome, and time).

  • Y_surv: A survival::Surv object created from time and outcome.

  • sample_ids: A vector of sample IDs (the first column of the input data).

  • outcome_numeric: A numeric vector of outcome status.

  • time_numeric: A numeric vector of time, converted to days.

Examples

temp_csv_path <- tempfile(fileext = ".csv")
dummy_data <- data.frame(
  ID = paste0("Patient", 1:50),
  FeatureA = rnorm(50),
  FeatureB = runif(50, 0, 100),
  CategoricalFeature = sample(c("A", "B", "C"), 50, replace = TRUE),
  Outcome_Status = sample(c(0, 1), 50, replace = TRUE),
  Followup_Time_Months = runif(50, 10, 60)
)
write.csv(dummy_data, temp_csv_path, row.names = FALSE)

# Load and prepare data
prepared_data <- load_and_prepare_data_pro(
  data_path = temp_csv_path,
  outcome_col_name = "Outcome_Status",
  time_col_name = "Followup_Time_Months",
  time_unit = "month"
)

# Check prepared data structure
str(prepared_data$X)
print(prepared_data$Y_surv[1:5])

# Clean up dummy file
unlink(temp_csv_path)

E2E documentation built on Aug. 27, 2025, 1:09 a.m.