mlr_measures_surv.schmid | R Documentation |
Calculates the Integrated Schmid Score (ISS), aka integrated absolute loss.
This measure has two dimensions: (test set) observations and time points.
For a specific individual i
from the test set, with observed survival
outcome (t_i, \delta_i)
(time and censoring indicator) and predicted
survival function S_i(t)
, the observation-wise loss integrated across
the time dimension up to the time cutoff \tau^*
, is:
L_{ISS}(S_i, t_i, \delta_i) = \int^{\tau^*}_0 \frac{S_i(\tau) \text{I}(t_i \leq \tau, \delta=1)}{G(t_i)} + \frac{(1-S_i(\tau)) \text{I}(t_i > \tau)}{G(\tau)} \ d\tau
where G
is the Kaplan-Meier estimate of the censoring distribution.
The re-weighted ISS (RISS) is:
L_{RISS}(S_i, t_i, \delta_i) = \delta_i \frac{\int^{\tau^*}_0 S_i(\tau) \text{I}(t_i \leq \tau) + (1-S_i(\tau)) \text{I}(t_i > \tau) \ d\tau}{G(t_i)}
which is always weighted by G(t_i)
and is equal to zero for a censored subject.
To get a single score across all N
observations of the test set, we
return the average of the time-integrated observation-wise scores:
\sum_{i=1}^N L(S_i, t_i, \delta_i) / N
This Measure can be instantiated via the dictionary mlr_measures or with the associated sugar function msr():
MeasureSurvSchmid$new() mlr_measures$get("surv.schmid") msr("surv.schmid")
Id | Type | Default | Levels | Range |
integrated | logical | TRUE | TRUE, FALSE | - |
times | untyped | - | - | |
t_max | numeric | - | [0, \infty) |
|
p_max | numeric | - | [0, 1] |
|
method | integer | 2 | [1, 2] |
|
se | logical | FALSE | TRUE, FALSE | - |
proper | logical | FALSE | TRUE, FALSE | - |
eps | numeric | 0.001 | [0, 1] |
|
ERV | logical | FALSE | TRUE, FALSE | - |
remove_obs | logical | FALSE | TRUE, FALSE | - |
Type: "surv"
Range: [0, \infty)
Minimize: TRUE
Required prediction: distr
integrated
(logical(1)
)
If TRUE
(default), returns the integrated score (eg across
time points); otherwise, not integrated (eg at a single time point).
times
(numeric()
)
If integrated == TRUE
then a vector of time-points over which to integrate the score.
If integrated == FALSE
then a single time point at which to return the score.
t_max
(numeric(1)
)
Cutoff time \tau^*
(i.e. time horizon) to evaluate the measure up to
(truncate S(t)
).
Mutually exclusive with p_max
or times
.
It's recommended to set t_max
to avoid division by eps
, see "Time Cutoff Details" section.
If t_max
is not specified, an Inf
time horizon is assumed.
p_max
(numeric(1)
)
The proportion of censoring to integrate up to in the given dataset.
Mutually exclusive with times
or t_max
.
method
(integer(1)
)
If integrate == TRUE
, this selects the integration weighting method.
method == 1
corresponds to weighting each time-point equally
and taking the mean score over discrete time-points.
method == 2
corresponds to calculating a mean weighted by the
difference between time-points.
method == 2
is the default value, to be in line with other packages.
se
(logical(1)
)
If TRUE
then returns standard error of the measure otherwise
returns the mean across all individual scores, e.g. the mean of
the per observation scores.
Default is FALSE
(returns the mean).
proper
(logical(1)
)
If TRUE
then weights scores by the censoring distribution at
the observed event time, which results in a strictly proper scoring
rule if censoring and survival time distributions are independent
and a sufficiently large dataset is used, see Sonabend et al. (2024).
If FALSE
then weights scores by the Graf method which is the
more common usage but the loss is not proper.
See "Properness" section for more details.
eps
(numeric(1)
)
Very small number to substitute zero values in order to prevent errors
in e.g. log(0) and/or division-by-zero calculations.
Default value is 0.001.
ERV
(logical(1)
)
If TRUE
then the Explained Residual Variation method is applied, which
means the score is standardized against a Kaplan-Meier baseline.
Default is FALSE
.
remove_obs
(logical(1)
)
Only effective when t_max
or p_max
is provided. Default is FALSE
.
If TRUE
, then we remove test observations for which the observed time (event or censoring) is strictly larger than t_max
.
See "Time Cutoff Details" section for more details.
RISS is strictly proper when the censoring distribution is independent
of the survival distribution and when G(t)
is fit on a sufficiently large dataset.
ISS is never proper. Use proper = FALSE
for ISS and
proper = TRUE
for RISS.
Results may be very different if many observations are censored at the last
observed time due to division by 1/eps
in proper = TRUE
.
See Sonabend et al. (2024) for more details.
The use of proper = TRUE
is considered experimental and should be used with caution.
If the times
argument is not specified (NULL
), then the unique (and
sorted) time points from the test set are used for evaluation of the
time-integrated score.
This was a design decision due to the fact that different predicted survival
distributions S(t)
usually have a discretized time domain which may
differ, i.e. in the case the survival predictions come from different survival
learners.
Essentially, using the same set of time points for the calculation of the score
minimizes the bias that would come from using different time points.
We note that S(t)
is by default constantly interpolated for time points that fall
outside its discretized time domain.
Naturally, if the times
argument is specified, then exactly these time
points are used for evaluation.
A warning is given to the user in case some of the specified times
fall outside
of the time point range of the test set.
The assumption here is that if the test set is large enough, it should have a
time domain/range similar to the one from the train set, and therefore time
points outside that domain might lead to interpolation or extrapolation of S(t)
.
If comparing the integrated graf score to other packages, e.g.
pec, then method = 2
should be used. However the results may
still be very slightly different as this package uses survfit
to estimate
the censoring distribution, in line with the Graf 1999 paper; whereas some
other packages use prodlim
with reverse = TRUE
(meaning Kaplan-Meier is
not used).
If task
and train_set
are passed to $score
then G(t)
is fit using
all observations from the train set, otherwise the test set is used.
Using the train set is likely to reduce any bias caused by calculating parts of the
measure on the test data it is evaluating.
Also usually it means that more data is used for fitting the censoring
distribution G(t)
via the Kaplan-Meier.
The training data is automatically used in scoring resamplings.
If t_max
or p_max
is given, then the predicted survival function S(t)
is
truncated at the time cutoff for all observations.
Also, if remove_obs = TRUE
, observations with observed times t > t_{max}
are removed.
This data preprocessing step mitigates issues that arise when using IPCW
in cases of administrative censoring, see Kvamme et al. (2023).
Practically, this step, along with setting a time cutoff t_max
, helps mitigate
the inflation of the score observed when an observation is censored at the
final time point. In such cases, G(t) = 0
, triggering the use of a
small constant eps
instead.
This inflation particularly impacts the proper version of the score, see Sonabend et al. (2024)
for more details.
Note that the t_max
and remove_obs
parameters do not affect the estimation
of the censoring distribution, i.e. always all the observations are used for estimating G(t)
.
If remove_obs = FALSE
, inflated scores may occur. While this aligns more closely
with the definitions presented in the original papers, it can lead to misleading
evaluation and poor optimization outcomes when using this score for model tuning.
mlr3::Measure
-> mlr3proba::MeasureSurv
-> MeasureSurvSchmid
new()
Creates a new instance of this R6 class.
MeasureSurvSchmid$new(ERV = FALSE)
ERV
(logical(1)
)
Standardize measure against a Kaplan-Meier baseline
(Explained Residual Variation)
clone()
The objects of this class are cloneable with this method.
MeasureSurvSchmid$clone(deep = FALSE)
deep
Whether to make a deep clone.
Schemper, Michael, Henderson, Robin (2000). “Predictive Accuracy and Explained Variation in Cox Regression.” Biometrics, 56, 249–255. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1002/sim.1486")}.
Schmid, Matthias, Hielscher, Thomas, Augustin, Thomas, Gefeller, Olaf (2011). “A Robust Alternative to the Schemper-Henderson Estimator of Prediction Error.” Biometrics, 67(2), 524–535. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1111/j.1541-0420.2010.01459.x")}.
Sonabend, Raphael, Zobolas, John, Kopper, Philipp, Burk, Lukas, Bender, Andreas (2024). “Examining properness in the external validation of survival models with squared and logarithmic losses.” https://arxiv.org/abs/2212.05260v2.
Kvamme, Havard, Borgan, Ornulf (2023). “The Brier Score under Administrative Censoring: Problems and a Solution.” Journal of Machine Learning Research, 24(2), 1–26. ISSN 1533-7928, http://jmlr.org/papers/v24/19-1030.html.
Other survival measures:
mlr_measures_surv.calib_alpha
,
mlr_measures_surv.calib_beta
,
mlr_measures_surv.calib_index
,
mlr_measures_surv.chambless_auc
,
mlr_measures_surv.cindex
,
mlr_measures_surv.dcalib
,
mlr_measures_surv.graf
,
mlr_measures_surv.hung_auc
,
mlr_measures_surv.intlogloss
,
mlr_measures_surv.logloss
,
mlr_measures_surv.mae
,
mlr_measures_surv.mse
,
mlr_measures_surv.nagelk_r2
,
mlr_measures_surv.oquigley_r2
,
mlr_measures_surv.rcll
,
mlr_measures_surv.rmse
,
mlr_measures_surv.song_auc
,
mlr_measures_surv.song_tnr
,
mlr_measures_surv.song_tpr
,
mlr_measures_surv.uno_auc
,
mlr_measures_surv.uno_tnr
,
mlr_measures_surv.uno_tpr
,
mlr_measures_surv.xu_r2
Other Probabilistic survival measures:
mlr_measures_surv.graf
,
mlr_measures_surv.intlogloss
,
mlr_measures_surv.logloss
,
mlr_measures_surv.rcll
Other distr survival measures:
mlr_measures_surv.calib_alpha
,
mlr_measures_surv.calib_index
,
mlr_measures_surv.dcalib
,
mlr_measures_surv.graf
,
mlr_measures_surv.intlogloss
,
mlr_measures_surv.logloss
,
mlr_measures_surv.rcll
library(mlr3)
# Define a survival Task
task = tsk("lung")
# Create train and test set
part = partition(task)
# Train Cox learner on the train set
cox = lrn("surv.coxph")
cox$train(task, row_ids = part$train)
# Make predictions for the test set
p = cox$predict(task, row_ids = part$test)
# ISS, G(t) calculated using the test set
p$score(msr("surv.schmid"))
# ISS, G(t) calculated using the train set (always recommended)
p$score(msr("surv.schmid"), task = task, train_set = part$train)
# ISS, ERV score (comparing with KM baseline)
p$score(msr("surv.schmid", ERV = TRUE), task = task, train_set = part$train)
# ISS at specific time point
p$score(msr("surv.schmid", times = 365), task = task, train_set = part$train)
# ISS at multiple time points (integrated)
p$score(msr("surv.schmid", times = c(125, 365, 450), integrated = TRUE),
task = task, train_set = part$train)
# ISS, use time cutoff
p$score(msr("surv.schmid", t_max = 700), task = task, train_set = part$train)
# ISS, use time cutoff and also remove observations
p$score(msr("surv.schmid", t_max = 700, remove_obs = TRUE),
task = task, train_set = part$train)
# ISS, use time cutoff corresponding to specific proportion of censoring on the test set
p$score(msr("surv.schmid", p_max = 0.8), task = task, train_set = part$train)
# RISS, G(t) calculated using the train set
p$score(msr("surv.schmid", proper = TRUE), task = task, train_set = part$train)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.