survSuperLearner: Super Learner for conditional survival functions

View source: R/SL_functions.R

survSuperLearnerR Documentation

Super Learner for conditional survival functions

Description

This function estimates conditional survival functions for the event and censoring times from right-censored data.

Usage

survSuperLearner(
  time,
  event,
  X,
  newX,
  new.times,
  event.SL.library,
  cens.SL.library,
  id = NULL,
  verbose = FALSE,
  control = list(),
  cvControl = list(),
  obsWeights = NULL
)

Arguments

time

n x 1 numeric vector of observed right-censored follow-up times; i.e. the minimum of the event and censoring times.

event

n x 1 numeric vector of status indicators of whether an event was observed.

X

n x p data.frame of observed covariate values on which to train the SuperLearner.

newX

m x p data.frame of new observed covariate values at which to obtain predictions for the estimated algorithm. Must have the same names and structure as X.

new.times

k x 1 numeric vector of times at which to obtain predicted conditional survivals.

event.SL.library

Library of candidate learners to use to estimate the conditional survival of the event. Should have the same structure as the SL.library argument to the SuperLearner function in the SuperLearner package; see details below. Run survlistWrappers() to see a list of currently available prediction and screening algorithms. #' @param cens.SL.library Library of candidate learners to use to estimate the conditional survival of censoring.

id

Optional n x 1 vector of observation clusters. If provided, cross-validation folds will respect clustering and id will be passed to every learner, though some learners may not make use of it. Default is every observation in its own cluster; i.e. iid observations.

verbose

TRUE/FALSE indicating whether to print progress in the fitting process to the console.

control

Named list of parameters controlling the fitting process. See survSuperLearner.control for details.

cvControl

Named list of parameters controlling the cross-validation process. See survSuperLearner.cvControl for details.

obsWeights

Optional n x 1 vector of observation weights. If provided, these weights will be passed to each learner, which may or may not make use of them (or make use of them correctly), and will be used in the ensemble step to weight the empirical risk function.

Details

The conditional survival function of the event at time t given covariates X is defined as the probability that the event occurs after time t given covariate values x. The conditional survival function of censoring is the probability that the censoring time occurs after t given covariates x. This function finds the optimal weighted combination, i.e. the Super Learner, of candidate learners for both of these functions simultaneously.

Value

survSuperLearner returns a named list with the following elements:

call

The matched call.

event.libraryNames, cens.libraryNames

Parsed learner names.

event.SL.library, cens.SL.library

Libraries used for fitting.

event.SL.predict, cens.SL.predict

m x k matrices of SuperLearner predicted survival values. Rows index observations in newX; columns index times in new.times.

event.coef, cens.coef

Fitted SuperLearner coefficients for the model for the conditional survival functions for the event and censoring times, respectively.

event.library.predict, cens.library.predict

m x k x p predicted event and censoring survivals on newX and new.times from the candidate learners, where p is the number of candidated learners.

event.Z, cens.Z

n x l x p cross-validated event and censoring survivals on the training data, where l is the number of elements in control$event.t.grid and control$cens.t.grid, respectively, and p is the number of candidate learners.

event.cvRisk, cens.cvRisk

Cross-validated risks for the candidate conditional event and censoring survival functions.

event.fitLibrary, cens.fitLibrary

Fitted conditional survival functions for all learners in the library on the full data.

varNames

Variable names of the training data.

validRows

Length V list containing the indices contained in each fold used for cross-validation.

event.whichScreen, cens.whichScreen

Matrix indicating which variables were included in each screening algorithm in the full training data.

control, cvControl

Parameters used for controlling the fitting and cross-validation processes, respectively.

event.errorsInCVLibrary, cens.errorsInCVLibrary

Logical matrices indicating whether each learning algorithm encountered any errors in each cross-validation fold.

event.errorsInLibrary, cens.errorsInLibrary

Logical vectors indicating whether each learning algorithm encountered any errors on the full data.

times

Timing data.

References

van der Laan, M. J., Polley, E. C., & Hubbard, A. E. (2007). Super learner. Statistical Applications in Genetics and Molecular Biology, 6(1).

van der Laan, M. J., and Rose, S. (2011). Targeted Learning: Causal inference for observational and experimental data. Springer-Verlag New York.

Examples

n <- 100
X <- data.frame(X1 = rnorm(n), X2 = rbinom(n, size = 1, prob = 0.5))

S0 <- function(t, x) pexp(t, rate = exp(-2 + x[,1] - x[,2] + .5 * x[,1] * x[,2]), lower.tail = FALSE)
T <- rexp(n, rate = exp(-2 + X[,1] - X[,2] + .5 *  X[,1] * X[,2]))

G0 <- function(t, x) {
  as.numeric(t < 15) * .9 * pexp(t, rate = exp(-2 -.5 * x[,1] - .25 * x[,2] + .5 * x[,1] * x[,2]), lower.tail=FALSE)
}
C0 <- rbinom(n, 1, .1)
C <- rexp(n, exp(-2 -.5 * X[,1] - .25 * X[,2] + .5 * X[,1] * X[,2]))
C[C0 == 1] <- 0
C[C > 15] <- 15

time <- pmin(T, C)
event <- as.numeric(T <= C)

event.SL.library <- cens.SL.library <- lapply(c("survSL.km", "survSL.coxph", "survSL.expreg", "survSL.weibreg", "survSL.loglogreg", "survSL.gam", "survSL.rfsrc"), function(alg) {
  c(alg, "survscreen.glmnet", "survscreen.marg", "All")
})

fit <- survSuperLearner(time = time, event = event, X = X, newX = X, new.times = seq(0, 15, .1), event.SL.library = event.SL.library, cens.SL.library = cens.SL.library, verbose = TRUE)

fit$event.coef[which(fit$event.coef > 0)]
fit$cens.coef[which(fit$cens.coef > 0)]

plot(fit$event.SL.predict[1,], S0(t =  seq(0, 15, .1), X[1,]))
abline(0,1,col='red')
plot(fit$cens.SL.predict[1,], G0(t =  seq(0, 15, .1), X[1,]))
abline(0,1,col='red')

tedwestling/survSuperLearner documentation built on Dec. 12, 2024, 4:16 p.m.