View source: R/Coxmos_common_functions.R
getTrainTest | R Documentation |
Splits input data (X and Y) into training and test sets for survival analysis, ensuring balanced event distributions. Supports single or multiple splits (repeats) for cross-validation and multiblock data in X parameter.
getTrainTest(X, Y, p = 0.8, times = 1, seed = 123)
X |
Numeric matrix, data.frame or list of matrices or data.frames. Predictor variables (features). Rows are samples, columns are variables. |
Y |
Numeric matrix or data.frame. Response variables. Object must have two columns named as "time" and "event". For event column, accepted values are: 0/1 or FALSE/TRUE for censored and event observations. |
p |
Numeric (0 < p < 1). Proportion of samples to allocate to the training set (default: 0.8). |
times |
Integer. Number of splits to perform repeated partitioning (default: 1). |
seed |
Integer. Random seed for reproducibility (default: 123). |
This function uses caret::createDataPartition() to partition the data while preserving the proportion of events (e.g., deaths) in both training and test sets. It is designed for survival data where Y must contain an event column (binary: 1=event, 0=censored).
If times = 1: A list with:
X_train: Training features.
Y_train: Training survival data.
X_test: Test features.
Y_test: Test survival data.
If times > 1: A named list of length times, each element containing the above structure.
Pedro Salguero Garcia. Maintainer: pedsalga@upv.edu.es
createDataPartition
# Single split (80% training, 20% test)
data(X_proteomic, Y_proteomic)
lst <- getTrainTest(X_proteomic, Y_proteomic, p = 0.8)
# Repeated splits (3x)
lst_repeats <- getTrainTest(X_proteomic, Y_proteomic, p = 0.7, times = 3)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.