The modelling of survival data may be done using ClassifyR, not only categorical data. Currently, feature selection based on Cox proportional hazards ranking and models built on Cox proportional hazards or random survival forests are available.

To illustrate, clinical variables of the METABRIC breast cancer cohort will be used to predict risk scores for patients.

library(ClassifyR)
set.seed(8400)
data(METABRICclinical) # Contains measurements and follow-up time and recurrence status.
head(clinical)

Cross-validation is very similar to the classification scenario, except that a Surv object or two column names indicating the time and event (in that order) are to be specified.

survCrossValidated <- crossValidate(clinical, c("timeRFS", "eventRFS"))
survCrossValidated

By default, Cox proportional hazards has been used for feature selection as well as modelling.

The distribution of C-index values can be plotted.

performancePlot(survCrossValidated)

The typical C-index is about 0.65, often seen in genomics analysis.

Now, do no explicit feature selection and use a random survival forest. This will take substantially longer than Cox proportional hazards because parameter tuning of the fraction of variables to consider at each split and the number of trees to build are optimised over a grid of default values (see Parameter Tuning Presets article for details).

survForestCrossValidated <- crossValidate(clinical, c("timeRFS", "eventRFS"),
                                          selectionMethod = "none",
                                          classifier = "randomSurvivalForest",
                                          nCores = 20)
resultsList <- list(survCrossValidated, survForestCrossValidated)
performancePlot(resultsList)

The random survival forest performs better than random chance but not better than Cox proportional hazards.

Exercise: Plot the per-sample C-index using samplesMetricMap(resultsList) to identify patients who are predicted well and those who are not.

Finally, note that extreme gradient boosting can also fit survival models. classifier = "XGB" will fit such a model. Remember that classifier can also be a vector of strings, so all three models can be fitted in one command.



DarioS/ClassifyR documentation built on June 11, 2024, 11:25 a.m.