breslow | R Documentation |
Helper function to compose a survival distribution (or cumulative hazard)
from the relative risk predictions (linear predictors, lp
) of a
proportional hazards model (e.g. a Cox-type model).
breslow(times, status, lp_train, lp_test, eval_times = NULL, type = "surv")
times |
( |
status |
( |
lp_train |
( |
lp_test |
( |
eval_times |
( |
type |
( |
We estimate the survival probability of individual i
(from the test set),
at time point t
as follows:
S_i(t) = e^{-H_i(t)} = e^{-\hat{H}_0(t) \times e^{lp_i}}
where:
H_i(t)
is the cumulative hazard function for individual i
\hat{H}_0(t)
is Breslow's estimator for the cumulative baseline
hazard. Estimation requires the training set's times
and status
as well
the risk predictions (lp_train
).
lp_i
is the risk prediction (linear predictor) of individual i
on the test set.
Breslow's approach uses a non-parametric maximum likelihood estimation of the cumulative baseline hazard function:
\hat{H}_0(t) = \sum_{i=1}^n{\frac{I(T_i \le t)\delta_i}
{\sum\nolimits_{j \in R_i}e^{lp_j}}}
where:
t
is the vector of time points (unique and sorted, from the train set)
n
is number of events (train set)
T
is the vector of event times (train set)
\delta
is the status indicator (1 = event or 0 = censored)
R_i
is the risk set (number of individuals at risk just before
event i
)
lp_j
is the risk prediction (linear predictor) of individual j
(who is part of the risk set R_i
) on the train set.
We employ constant interpolation to estimate the cumulative baseline hazards,
extending from the observed unique event times to the specified evaluation
times (eval_times
).
Any values falling outside the range of the estimated times are assigned as
follows:
\hat{H}_0(eval\_times < min(t)) = 0
and
\hat{H}_0(eval\_times > max(t)) = \hat{H}_0(max(t))
Note that in the rare event of lp
predictions being Inf
or -Inf
, the
resulting cumulative hazard values become NaN
, which we substitute with
Inf
(and corresponding survival probabilities take the value of 0
).
For similar implementations, see gbm::basehaz.gbm()
, C060::basesurv()
and
xgboost.surv::sgb_bhaz()
.
a matrix
(obs x times). Number of columns is equal to eval_times
and number of rows is equal to the number of test observations (i.e. the
length of the lp_test
vector). Depending on the type
argument, the matrix
can have either survival probabilities (0-1) or cumulative hazard estimates
(0-Inf
).
Breslow N (1972). “Discussion of 'Regression Models and Life-Tables' by D.R. Cox.” Journal of the Royal Statistical Society: Series B, 34(2), 216-217.
Lin, Y. D (2007). “On the Breslow estimator.” Lifetime Data Analysis, 13(4), 471-480. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/s10985-007-9048-y")}.
task = tsk("rats")
part = partition(task, ratio = 0.8)
learner = lrn("surv.coxph")
learner$train(task, part$train)
p_train = learner$predict(task, part$train)
p_test = learner$predict(task, part$test)
surv = breslow(times = task$times(part$train), status = task$status(part$train),
lp_train = p_train$lp, lp_test = p_test$lp)
head(surv)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.