breslow: Survival probabilities using Breslow's estimator
In mlr-org/mlr3proba: Probabilistic Supervised Learning for 'mlr3'

breslow

R Documentation

Survival probabilities using Breslow's estimator

Description

Helper function to compose a survival distribution (or cumulative hazard) from the relative risk predictions (linear predictors, lp) of a proportional hazards model (e.g. a Cox-type model).

Usage

breslow(times, status, lp_train, lp_test, eval_times = NULL, type = "surv")

Arguments

`times`	(`numeric()`) Vector of times (train set).
`status`	(`numeric()`) Vector of status indicators (train set). For each observation in the train set, this should be 0 (alive/censored) or 1 (dead).
`lp_train`	(`numeric()`) Vector of linear predictors (train set). These are the relative score predictions (`lp = \hat{\beta}X_{train}`) from a proportional hazards model on the train set.
`lp_test`	(`numeric()`) Vector of linear predictors (test set). These are the relative score predictions (`lp = \hat{\beta}X_{test}`) from a proportional hazards model on the test set.
`eval_times`	(`numeric()`) Vector of times to compute survival probabilities. If `NULL` (default), the unique and sorted `times` from the train set will be used, otherwise the unique and sorted `eval_times`.
`type`	(`character()`) Type of prediction estimates. Default is `surv` which returns the survival probabilities `S_i(t)` for each test observation `i`. If `cumhaz`, the function returns the estimated cumulative hazards `H_i(t)`.

Details

We estimate the survival probability of individual i (from the test set), at time point t as follows:

S_i(t) = e^{-H_i(t)} = e^{-\hat{H}_0(t) \times e^{lp_i}}

where:

H_i(t) is the cumulative hazard function for individual i
\hat{H}_0(t) is Breslow's estimator for the cumulative baseline hazard. Estimation requires the training set's times and status as well the risk predictions (lp_train).
lp_i is the risk prediction (linear predictor) of individual i on the test set.

Breslow's approach uses a non-parametric maximum likelihood estimation of the cumulative baseline hazard function:

\hat{H}_0(t) = \sum_{i=1}^n{\frac{I(T_i \le t)\delta_i} {\sum\nolimits_{j \in R_i}e^{lp_j}}}

where:

t is the vector of time points (unique and sorted, from the train set)
n is number of events (train set)
T is the vector of event times (train set)
\delta is the status indicator (1 = event or 0 = censored)
R_i is the risk set (number of individuals at risk just before event i)
lp_j is the risk prediction (linear predictor) of individual j (who is part of the risk set R_i) on the train set.

We employ constant interpolation to estimate the cumulative baseline hazards, extending from the observed unique event times to the specified evaluation times (eval_times). Any values falling outside the range of the estimated times are assigned as follows:

\hat{H}_0(eval\_times < min(t)) = 0

and

\hat{H}_0(eval\_times > max(t)) = \hat{H}_0(max(t))

Note that in the rare event of lp predictions being Inf or -Inf, the resulting cumulative hazard values become NaN, which we substitute with Inf (and corresponding survival probabilities take the value of 0).

For similar implementations, see gbm::basehaz.gbm(), C060::basesurv() and xgboost.surv::sgb_bhaz().

Value

a matrix (obs x times). Number of columns is equal to eval_times and number of rows is equal to the number of test observations (i.e. the length of the lp_test vector). Depending on the type argument, the matrix can have either survival probabilities (0-1) or cumulative hazard estimates (0-Inf).

References

Breslow N (1972). “Discussion of 'Regression Models and Life-Tables' by D.R. Cox.” Journal of the Royal Statistical Society: Series B, 34(2), 216-217.

Lin, Y. D (2007). “On the Breslow estimator.” Lifetime Data Analysis, 13(4), 471-480. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/s10985-007-9048-y")}.

Examples

task = tsk("rats")
part = partition(task, ratio = 0.8)

learner = lrn("surv.coxph")
learner$train(task, part$train)
p_train = learner$predict(task, part$train)
p_test = learner$predict(task, part$test)

surv = breslow(times = task$times(part$train), status = task$status(part$train),
               lp_train = p_train$lp, lp_test = p_test$lp)
head(surv)

mlr-org/mlr3proba documentation built on April 12, 2025, 4:38 p.m.