breslow: Survival probabilities using Breslow's estimator

View source: R/breslow.R

breslowR Documentation

Survival probabilities using Breslow's estimator

Description

Helper function to compose a survival distribution (or cumulative hazard) from the relative risk predictions (linear predictors, lp) of a proportional hazards model (e.g. a Cox-type model).

Usage

breslow(times, status, lp_train, lp_test, eval_times = NULL, type = "surv")

Arguments

times

(numeric())
Vector of times (train set).

status

(numeric())
Vector of status indicators (train set). For each observation in the train set, this should be 0 (alive/censored) or 1 (dead).

lp_train

(numeric())
Vector of linear predictors (train set). These are the relative score predictions (lp = \hat{\beta}X_{train}) from a proportional hazards model on the train set.

lp_test

(numeric())
Vector of linear predictors (test set). These are the relative score predictions (lp = \hat{\beta}X_{test}) from a proportional hazards model on the test set.

eval_times

(numeric())
Vector of times to compute survival probabilities. If NULL (default), the unique and sorted times from the train set will be used, otherwise the unique and sorted eval_times.

type

(character())
Type of prediction estimates. Default is surv which returns the survival probabilities S_i(t) for each test observation i. If cumhaz, the function returns the estimated cumulative hazards H_i(t).

Details

We estimate the survival probability of individual i (from the test set), at time point t as follows:

S_i(t) = e^{-H_i(t)} = e^{-\hat{H}_0(t) \times e^{lp_i}}

where:

  • H_i(t) is the cumulative hazard function for individual i

  • \hat{H}_0(t) is Breslow's estimator for the cumulative baseline hazard. Estimation requires the training set's times and status as well the risk predictions (lp_train).

  • lp_i is the risk prediction (linear predictor) of individual i on the test set.

Breslow's approach uses a non-parametric maximum likelihood estimation of the cumulative baseline hazard function:

\hat{H}_0(t) = \sum_{i=1}^n{\frac{I(T_i \le t)\delta_i} {\sum\nolimits_{j \in R_i}e^{lp_j}}}

where:

  • t is the vector of time points (unique and sorted, from the train set)

  • n is number of events (train set)

  • T is the vector of event times (train set)

  • \delta is the status indicator (1 = event or 0 = censored)

  • R_i is the risk set (number of individuals at risk just before event i)

  • lp_j is the risk prediction (linear predictor) of individual j (who is part of the risk set R_i) on the train set.

We employ constant interpolation to estimate the cumulative baseline hazards, extending from the observed unique event times to the specified evaluation times (eval_times). Any values falling outside the range of the estimated times are assigned as follows:

\hat{H}_0(eval\_times < min(t)) = 0

and

\hat{H}_0(eval\_times > max(t)) = \hat{H}_0(max(t))

Note that in the rare event of lp predictions being Inf or -Inf, the resulting cumulative hazard values become NaN, which we substitute with Inf (and corresponding survival probabilities take the value of 0).

For similar implementations, see gbm::basehaz.gbm(), C060::basesurv() and xgboost.surv::sgb_bhaz().

Value

a matrix (obs x times). Number of columns is equal to eval_times and number of rows is equal to the number of test observations (i.e. the length of the lp_test vector). Depending on the type argument, the matrix can have either survival probabilities (0-1) or cumulative hazard estimates (0-Inf).

References

Breslow N (1972). “Discussion of 'Regression Models and Life-Tables' by D.R. Cox.” Journal of the Royal Statistical Society: Series B, 34(2), 216-217.

Lin, Y. D (2007). “On the Breslow estimator.” Lifetime Data Analysis, 13(4), 471-480. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/s10985-007-9048-y")}.

Examples

task = tsk("rats")
part = partition(task, ratio = 0.8)

learner = lrn("surv.coxph")
learner$train(task, part$train)
p_train = learner$predict(task, part$train)
p_test = learner$predict(task, part$test)

surv = breslow(times = task$times(part$train), status = task$status(part$train),
               lp_train = p_train$lp, lp_test = p_test$lp)
head(surv)

mlr-org/mlr3proba documentation built on April 12, 2025, 4:38 p.m.