Survcompare_application
In survcompare: Nested Cross-Validation to Compare Cox-PH, Cox-Lasso, Survival Random Forests

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  rf_user <- rfcores_old <- options()$rf.cores,
  warnings_user<- options()$warn,
  scipen_user<- options()$scipen,
  digits_user<- options()$digits,
  options(rf.cores = 1),
  options(warn = -1),
  options(scipen = 100, digits = 4)
)

Package background

The package checks whether there are considerable non-linear and interaction terms in the data, and quantifies their contributions to the models' performance. Using repeated nested cross-validation (that is, a system of random data splits into the training and testing sets to evaluate prediction accuracy), the package:

Validates Cox Proportionate Hazards model, or Cox Lasso depending on the user's input. This step employs
- survival::coxph() (Therneau, 2023);
- glmnet::glmnet(..., family="cox") (Simon, Friedman, Hastie & Tibshirani , 2011).
Validates Survival Random Forest ensembled with the baseline Cox model. To fit the ensemble, CoxPH's out-of-sample predictions are added to the list of orignial predictors, which are then used to fit SRF, using
- randomForestSRC::rfsrc()(Ishwaran & Kogalur, 2023).
Validates stacked ensemble of the CoxPH and Survival Random Forest, which predicts as a linear combination of the CoxPH and SRF, (1-lambda) x CoxPH + lambda x SRF. Lambda parameter is tuned within the package and the value can indicate if taking a share of SRF predictions could improve the baseline CoxPH's performance. Lambda = 0 means only CoxPH is used, lambda = 1 means the model is the same as SRF.
GitHit only: deep learning model DeepHit 'survivalmodels::deephit()', as well as its sequential and stacked ensembles with CoxPH or Cox-Lasso.
Performs statistical testing of whether the Survival Random Forest [ensemble] outperforms the Cox model.
Does NOT handle missing data at the moment.

Predictive performance is evaluated by averaging different measures across all train-test splits. The following measures are computed:

Discrimination measures: Harrell's concordance index, time-dependent AUCROC.
Calibration measures: calibration slope, calibration alpha.
Overall fit: Brier score, Scaled Brier score.

Performance metrics' description and definitions can be found, for example, in Steyerberg & Vergouwe (2014).

References:

Therneau T (2023). A Package for Survival Analysis in R. R package version 3.5-7, https://CRAN.R-project.org/package=survival.

Simon N, Friedman J, Tibshirani R, Hastie T (2011). "Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent." Journal of Statistical Software, 39(5), 1--13. .

Ishwaran H, Kogalur U (2023). Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC). R package version 3.2.2, https://cran.r-project.org/package=randomForestSRC

Steyerberg EW, Vergouwe Y. (2014). Towards better clinical prediction models: seven steps for development and an ABCD for validation. European heart journal, 35(29), 1925-1931 https://doi.org/10.1093/eurheartj/ehu207

What can be inferred from the `survcompare` results?

There are two possible outcomes: 1) "Survival Random Forest ensemble has NOT outperformed CoxPH", or 2) "Survival Random Forest ensemble has outperformed CoxPH by ... in C-index".

1) If there is no outperformance, the test can justify the employment of CoxPH model and indicate a negligible advantage of using a more flexible model such as Survival Random Forest.

2) In the case of the ensemble's outperformance of the Cox model, a researcher can:

-   Employ a more complex or flexible model.
-   Look for the interaction and non-linear terms that could be added to the CoxPH and re-run the test.
-   Consider using the CoxPH model despite its underperformance if the difference is not large enough to sacrifice model's interpretability, or negligible in the context of a performed task.

Before interpreting the results, ensure that a sufficient number of repetitions (repeat_cv) has been used. Parameter repeat_cv should be at least 5, but for a heterogeneous data 20-50 may be needed. Otherwise, the results may unreliable and subject to chance.

Why the CoxPH-SRF ensemble and not just SRF?

First, you can use the package to compare the performances of the CoxPH and SRF themselves.

Second, the ensembles aim to single out the predictive value of the non-linearity and other data relationships that could not be captured by the baseline models. In both ensembles, the final models has a direct access to the predictions of the baseline CoxPH, and hence, the outperformance can be fully attributed to such complex relationships.

For example, the sequential ensemble of Cox and SRF takes the predictions of the Cox model and adds to the list of predictors to train SRF. This way, we make sure that linearity is captured by SRF at least as good as in the Cox model, and hence the marginal outperformance of the ensemble over the Cox model can be fully attributed to the qualities of SRF that Cox does not have, that is, data complexity.

Package installation

# For the main version:

# install.packages("devtools")
# devtools::install_github("dianashams/survcompare")

# DeepHit Extension:
# devtools::install_github("dianashams/survcompare", ref = "DeepHit")
library(survcompare)

Examples

Example 1. Linear data

The first example takes simulated data that does not contain any complex terms, and CoxPH is expected to perform as good as Survival Random Forest.

mydata <- simulate_linear(200)
mypredictors <- names(mydata)[1:4]
survcompare(mydata, mypredictors, randomseed = 100)

NOTE: the same results can be obtained by cross-validating CoxPH and SRF Ensemble separately,and then using survcompare2() function; outer_cv, inner_cv, repeat_cv, and randomseed should be the same.

#cross-validate CoxPH
 cv1<- survcox_cv(mydata, mypredictors, randomseed = 100)
#cross-validate Survival Random Forests ensemble
 cv2<- survsrfens_cv(mydata, mypredictors, randomseed = 100)
#compare the models 
 survcompare2(cv1, cv2)

Example 2. Non-linear data with interaction terms

The second example takes simulated data that contains non-linear and cross-terms, and an outperformance of the tree model is expected. We will increase the default number of cross-validation repetitions to get a robust estimate, choose CoxLasso as our baseline linear model, and SRF as the alternative (not the ensemble).

mydata2 <- simulate_crossterms(200)
mypredictors2 <- names(mydata2)[1:4]

#cross-validate CoxPH
cv1 <- survcox_cv(mydata2, mypredictors2, randomseed = 100, repeat_cv = 3, useCoxLasso = TRUE)
# cross-validate Survival Random Forests ensemble
cv2 <- survsrf_cv(mydata2, mypredictors2, randomseed = 100, repeat_cv = 3)
# compare the models 
compare_nonl <- survcompare2(cv1, cv2)
compare_nonl

Detailed results can be extracted from the output object. For example, test performance for each data split (across all cross-validations and repetitions).

# Main stats 
compare_nonl$main_stats_pooled

# More information, including Brier Scores, Calibration slope and alphas can be seen in the
# compare_models_1$results_mean. These are averaged performance metrics on the test sets.
compare_nonl$results_mean

We can also evaluate the stacked ensemble of Cox-PH and SRF and see the tuned lambda and SRF model parameters:

#cross-validate CoxPH
cv3<- survsrfstack_cv(mydata2, mypredictors2, randomseed = 100, repeat_cv = 3)

survcompare2(cv1, cv3)

Display all tuned model params. Lambda shows relative share of the SRF in the linear meta-learner. Lambda = 0 means that the model only uses CoxPH. Lambda = 1 means that only SRF is used.

# all tuned params
cv3$bestparams

lambdas <- unlist(cv3$bestparams$lambda)
print("Mean lambda in the stacked ensemble of the SRF and CoxPH, indicating the share of SRF predictions in the meta-learner:") 
print(mean(lambdas))

plot(lambdas)
abline(h=mean(lambdas), col = "blue")

Example 3. Applying `survcompare` to GBSG data

Now, lets apply the package to a real life data. We will use GBSG data from the survival package (https://rdrr.io/cran/survival/man/gbsg.html).

library(survival)

#prepare the data
mygbsg <- gbsg
mygbsg$time <- gbsg$rfstime / 365
mygbsg$event <- gbsg$status
myfactors <-
  c("age", "meno", "size", "grade", "nodes", "pgr", "er", "hormon")
mygbsg <- mygbsg[c("time", "event", myfactors)]
sum(is.na(mygbsg[myfactors])) #check if any missing data

# run survcompare 
cv1<- survcox_cv(mygbsg, myfactors, randomseed = 42, repeat_cv = 3)
cv2<- survsrfens_cv(mygbsg, myfactors, randomseed = 42, repeat_cv = 3)
survcompare2(cv1, cv2)

We got the following results (depending on a random seed, it may slightly differ):

Internally validated test performance of CoxPH and SRF_ensemble over 3 repeated 3 fold cross-validations (inner k = 3 ). Mean performance:

Median performance:

SRF_ensemble has outperformed CoxPHby 0.0137 in C-index.
The difference is statistically significant with the p-value 0.012*.
The supplied data may contain non-linear or cross-term dependencies, 
better captured by SRF_ensemble.
Mean C-score: 
  CoxPH  0.6773(95CI=0.675-0.6801;SD=0.0028)
  SRF_ensemble 0.6911(95CI=0.6902-0.6918;SD=9e-04)
Mean AUCROC:
  CoxPH  0.7224(95CI=0.7176-0.728;SD=0.0055)
SRF_ensemble 0.7211(95CI=0.7177-0.7241;SD=0.0034)

 ```

This example illustrates a situation when outperformance of the non-linear model is statistically significant, but not large in absolute terms. The ultimate model choice will depend on how critical such an improvement is for the model stakeholders.


```r
# reinstating the value for rf.cores back to default/user-defined value
options(rf.cores = rfcores_old)
options(warn = warnings_user)
options(digits = digits_user)
options(scipen = scipen_user)

Any scripts or data that you put into this service are public.

survcompare documentation built on June 25, 2025, 5:09 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

survcompare
Nested Cross-Validation to Compare Cox-PH, Cox-Lasso, Survival Random Forests

Survcompare_application
In survcompare: Nested Cross-Validation to Compare Cox-PH, Cox-Lasso, Survival Random Forests

Package background

References:

What can be inferred from the `survcompare` results?

Why the CoxPH-SRF ensemble and not just SRF?

Package installation

Examples

Example 1. Linear data

Example 2. Non-linear data with interaction terms

Example 3. Applying `survcompare` to GBSG data

Try the survcompare package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

survcompare Nested Cross-Validation to Compare Cox-PH, Cox-Lasso, Survival Random Forests

Survcompare_application In survcompare: Nested Cross-Validation to Compare Cox-PH, Cox-Lasso, Survival Random Forests

Package background

References:

What can be inferred from the survcompare results?

Why the CoxPH-SRF ensemble and not just SRF?

Package installation

Examples

Example 1. Linear data

Example 2. Non-linear data with interaction terms

Example 3. Applying survcompare to GBSG data

Try the survcompare package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

survcompare
Nested Cross-Validation to Compare Cox-PH, Cox-Lasso, Survival Random Forests

Survcompare_application
In survcompare: Nested Cross-Validation to Compare Cox-PH, Cox-Lasso, Survival Random Forests

What can be inferred from the `survcompare` results?

Example 3. Applying `survcompare` to GBSG data