sl.time: Super Learner for Censored Outcomes

View source: R/sl.time.R

sl.timeR Documentation

Super Learner for Censored Outcomes

Description

This function allows to compute a Super Learner (SL) to predict survival outcomes.

Usage

sl.time(methods, metric, data, times, failures, group, cov.quanti, cov.quali, cv, 
param.tune, pro.time, optim.local.min, ROC.precision, param.weights.fix,
 param.weights.init, keep.predictions, verbose)

Arguments

methods

A vector of characters with the names of the algorithms included in the SL. At least two algorithms have to be included.

metric

The loss function used to estimate the weights of the algorithms in the SL. See details.

data

A data frame in which to look for the variables related to the status of the follow-up time (times), the event (failures), the optional treatment/exposure (group) and the covariables included in the previous model (cov.quanti and cov.quali).

times

The name of the variable related the numeric vector with the follow-up times.

failures

The name of the variable related the numeric vector with the event indicators (0=right censored, 1=event).

group

The name of the variable related to the exposure/treatment. This variable shall have only two modalities encoded 0 for the untreated/unexposed patients and 1 for the treated/exposed ones. The default value is NULL: no specific exposure/treatment is considered. When a specific exposure/treatment is considered, it will be forced in the algorithm or related interactions will be tested when possible.

cov.quanti

The name(s) of the variable(s) related to the possible quantitative covariates. These variables must be numeric.

cov.quali

The name(s) of the variable(s) related to the possible qualitative covariates. These variables must be numeric with two levels: 0 and 1. A complete disjunctive form must be used for covariates with more levels.

cv

The number of splits for cross-validation. The default value is 10.

param.tune

A list with a length equals to the number of algorithms included in methods. If NULL, the tunning parameters are estimated (see details).

pro.time

This optional value of prognostic time represents the maximum delay for which the capacity of the variable is evaluated. The same unit than the one used in the argument times. Not used for the following metrics: "loglik", "ibs", "bll", and "ibll". Default value is the time at which half of the subjects are still at risk.

optim.local.min

An optional logical value. If TRUE, the optimization is performed twice to better ensure the estimation of the weights. If FALSE (default value), the optimization is performed once.

ROC.precision

The percentiles (between 0 and 1) of the prognostic variable used for computing each point of the time dependent ROC curve. Only used when metric="auc". 0 (min) and 1 (max) are not allowed. By default, the precision is seq(.01,.99,.01).

param.weights.fix

A vector with the parameters of the multinomial logistic regession which generates the weigths of the algorithms declared in methods. When completed, the related parameters are not estimated. The default value is NULL: the parameters are estimated by a cv-fold cross-validation. See details.

param.weights.init

A vector with the initial values of the parameters of the multinomial logistic regession which generates the weigths of the algorithms declared in methods. The default value is NULL: the initial values are equaled to 0. See details.

keep.predictions

A logical value specifying if all the predictions for all the methods are saved. If FALSE, only the predictions related to the SL are saved (for space saving). The default is TRUE.

verbose

A logical value specifying if SuperLearner indicates whether to print progress (TRUE) in the fitting process to the console. The default is TRUE

Details

Each object of the list declared in param.tune must have the same name than the names of the methods included in the SL. If param.tune = NULL, the tunning parameters of each algorithm are estimated by cv-fold cross-validation. Otherwise, the user can propose a tunning grid for each method, as explained in the following table. The following metrics can be used: "brier" for the Brier score at the prognostic time pro.time, "loglik" for the Log-likelihood, "ibs" for the Integrated Brier score up to the last observed time of event, "ibll" for the Integrated Binomial Log-likelihood up to the last observed time of event, "bll" for the binomial Log-likelihood, "ribs" for the restricted Integrated Brier score up to the prognostic time pro.time, "ribll" for the restricted Integrated Binomial Log-likelihood Log-likelihood up to the last observed time of event, "bll" for the binomial Log-likelihood, "auc" for the area under the time-dependent ROC curve up to the prognostic time pro.time.

Methods:

Names Description Package assumption
"aft.gamma" Gamma flexsurv AFT
"aft.ggamma" Generalized Gamma flexsurv AFT
"aft.weibull" Weibull flexsurv AFT
"ph.exponential" Exponential flexsurv PH
"ph.gompertz" Gompertz flexsurv PH
"cox.en" Elastic Net Cox glmnet PH
"cox.lasso" Lasso Cox glmnet PH
"cox.ridge" Ridge Cox glmnet PH
"rf.time" Survival Random Forest randomForestSRC RF
"nn.time" Neural Network survivalmodels PH

Loss Function metric:

  • Brier Score ("bs")

  • Binomial log likelihood ("bll")

  • Integrated brier score ("ibs")

  • Integrated binomial log likelihood ("ibll")

  • Restricted Integrated Brier Score ("ribs")

  • Restricted Integrated Binomial Log-Likelihood ("ribll")

Value

times

A vector of numeric values with the times of the predictions.

predictions

A list of matrices with the predictions of survivals of each subject (lines) for each observed times (columns). Each matrix corresponds to the included methods and the resulted SL (the last item entitled "sl"). If keep.predictions=TRUE, it corresponds to a matrix with predictions related to the SL.

data

The data frame used for learning. The first column is entitled times and corresponds to the observed follow-up times. The second column is entitled failures and corresponds to the event indicators. The other columns correspond to the predictors.

predictors

A list with the predictors involved in group, cov.quanti and cov.quali.

ROC.precision

The percentiles (between 0 and 1) of the prognostic variable used for computing each point of the time dependent ROC curve.

cv

The number of splits for cross-validation.

pro.time

The maximum delay for which the capacity of the variable is evaluated.

models

A list with the estimated models/algorithms included in the SL.

weights

A list composed by two vectors: the regressions coefficients of the logistic multinomial regression and the resulting weights' values

metric

A list composed by two vectors: the loss function used to estimate the weights of the algorithms in the SL and its value.

param.tune

The estimated tunning parameters.

Author(s)

Yohann Foucher <Yohann.Foucher@univ-poitiers.fr>

Camille Sabathe <camille.sabathe@univ-nantes.fr>

References

Polley E and van der Laanet M. Super Learner In Prediction. http://biostats.bepress.com/ucbbiostat/paper266. 2010.

Sabathe C and Foucher Y. Super Learner for survival prediction from censored data: Extension of the R package RISCA. Manuscript submitted. 2022.

Examples


data(dataDIVAT2)

#The outcome model base on a Super Learner and the first 150 individuals of the data base
sl1<-sl.time( methods=c("aft.gamma", "ph.gompertz"),  metric="ibs",
  data=dataDIVAT2[1:150,],  times="times", failures="failures", group="ecd",
  cov.quanti=c("age"),  cov.quali=c("hla", "retransplant"), cv=3)
  
# Individual prediction
pred <- predict(sl1, newdata=data.frame(age=c(52,52), hla=c(0,1), retransplant=c(1,1), ecd=c(0,1)))

plot(y=pred$predictions$sl[1,], x=pred$times, xlab="Time (years)", ylab="Predicted survival",
     col=1, type="l", lty=1, lwd=2, ylim=c(0,1))

lines(y=pred$predictions$sl[2,], x=pred$times, col=2, type="l", lty=1, lwd=2)

legend("topright", col=c(1,2), lty=1, lwd=2, c("Subject #1", "Subject #2"))


RISCA documentation built on March 31, 2023, 11:06 p.m.

Related to sl.time in RISCA...