TaskSurv: Survival Task

TaskSurvR Documentation

Survival Task

Description

This task specializes mlr3::Task and mlr3::TaskSupervised for possibly-censored survival problems. The target is comprised of survival times and an event indicator. Predefined tasks are stored in mlr3::mlr_tasks.

The task_type is set to "surv".

Super classes

mlr3::Task -> mlr3::TaskSupervised -> TaskSurv

Active bindings

censtype

(character(1))
Returns the type of censoring, one of "right", "left", "counting", "interval", "interval2" or "mstate". Currently, only the "right"-censoring type is fully supported, the rest are experimental and the API will change in the future.

Methods

Public methods

Inherited methods

Method new()

Creates a new instance of this R6 class.

Usage
TaskSurv$new(
  id,
  backend,
  time = "time",
  event = "event",
  time2,
  type = c("right", "left", "interval", "counting", "interval2", "mstate"),
  label = NA_character_
)
Arguments
id

(character(1))
Identifier for the new instance.

backend

(DataBackend)
Either a DataBackend, or any object which is convertible to a DataBackend with as_data_backend(). E.g., a data.frame() will be converted to a DataBackendDataTable.

time

(character(1))
Name of the column for event time if data is right censored, otherwise starting time if interval censored.

event

(character(1))
Name of the column giving the event indicator. If data is right censored then "0"/FALSE means alive (no event), "1"/TRUE means dead (event). If type is "interval" then "0" means right censored, "1" means dead (event), "2" means left censored, and "3" means interval censored. If type is "interval2" then event is ignored.

time2

(character(1))
Name of the column for ending time of the interval for interval censored or counting process data, otherwise ignored.

type

(character(1))
Name of the column giving the type of censoring. Default is 'right' censoring.

label

(character(1))
Label for the new instance.

Details

Depending on the censoring type ("type"), the output of a survival task's "$target_names" is a character() vector with values the names of the columns given by the above initialization arguments. Specifically, the output is as follows (and in the specified order):

  • For type = "right", "left" or "mstate": ("time", "event")

  • For type = "interval" or "counting": ("time", "time2", "event")

  • For type = "interval2": ("time", ⁠"time2⁠)


Method truth()

True response for specified row_ids. This is the survival outcome using the Surv format and depends on the censoring type. Defaults to all rows with role "use".

Usage
TaskSurv$truth(rows = NULL)
Arguments
rows

(integer())
Row indices.

Returns

survival::Surv().


Method formula()

Creates a formula for survival models with survival::Surv() on the LHS (left hand side).

Usage
TaskSurv$formula(rhs = NULL, reverse = FALSE)
Arguments
rhs

If NULL, RHS (right hand side) is ".", otherwise RHS is "rhs".

reverse

If TRUE then formula calculated with 1 - status.

Returns

stats::formula().


Method times()

Returns the (unsorted) outcome times.

Usage
TaskSurv$times(rows = NULL)
Arguments
rows

(integer())
Row indices.

Returns

numeric()


Method status()

Returns the event indicator (aka censoring/survival indicator). If censtype is "right" or "left" then 1 is event and 0 is censored. If censtype is "mstate" then 0 is censored and all other values are different events. If censtype is "interval" then 0 is right-censored, 1 is event, 2 is left-censored, 3 is interval-censored. See survival::Surv().

Usage
TaskSurv$status(rows = NULL)
Arguments
rows

(integer())
Row indices.

Returns

integer()


Method unique_times()

Returns the sorted unique outcome times for "right", "left" and "mstate" types of censoring.

Usage
TaskSurv$unique_times(rows = NULL)
Arguments
rows

(integer())
Row indices.

Returns

numeric()


Method unique_event_times()

Returns the sorted unique event (or failure) outcome times for "right", "left" and "mstate" types of censoring.

Usage
TaskSurv$unique_event_times(rows = NULL)
Arguments
rows

(integer())
Row indices.

Returns

numeric()


Method risk_set()

Returns the row_ids of the observations at risk (not dead or censored or had other events in case of multi-state tasks) at the specified time.

Only designed for "right", "left" and "mstate" types of censoring.

Usage
TaskSurv$risk_set(time = NULL)
Arguments
time

(numeric(1))
Time to return risk set for, if NULL returns all row_ids.

Returns

integer()


Method kaplan()

Calls survival::survfit() to calculate the Kaplan-Meier estimator.

Usage
TaskSurv$kaplan(strata = NULL, rows = NULL, reverse = FALSE, ...)
Arguments
strata

(character())
Stratification variables to use.

rows

(integer())
Subset of row indices.

reverse

(logical())
If TRUE calculates Kaplan-Meier of censoring distribution (1-status). Default FALSE.

...

(any)
Additional arguments passed down to survival::survfit.formula().

Returns

survival::survfit.object.


Method reverse()

Returns the same task with the status variable reversed, i.e., 1 - status. Only designed for "left" and "right" censoring.

Usage
TaskSurv$reverse()
Returns

TaskSurv.


Method cens_prop()

Returns the proportion of censoring for this survival task. By default, this is returned for all observations, otherwise only the specified ones (rows).

Only designed for "right" and "left" censoring.

Usage
TaskSurv$cens_prop(rows = NULL)
Arguments
rows

(integer())
Row indices.

Returns

numeric()


Method admin_cens_prop()

Returns an estimated proportion of administratively censored observations (i.e. censored at or after a user-specified time point). Our main assumption here is that in an administratively censored dataset, the maximum censoring time is likely close to the maximum event time and so we expect higher proportion of censored subjects near the study end date.

Only designed for "right" and "left" censoring.

Usage
TaskSurv$admin_cens_prop(rows = NULL, admin_time = NULL, quantile_prob = 0.99)
Arguments
rows

(integer())
Row indices.

admin_time

(numeric(1))
Administrative censoring time (in case it is known a priori).

quantile_prob

(numeric(1))
Quantile probability value with which we calculate the cutoff time for administrative censoring. Ignored, if admin_time is given. By default, quantile_prob is equal to 0.99, which translates to a time point very close to the maximum outcome time in the dataset. A lower value will result in an earlier time point and therefore in a more relaxed definition (i.e. higher proportion) of administrative censoring.

Returns

numeric()


Method dep_cens_prop()

Returns the proportion of covariates (task features) that are found to be significantly associated with censoring. This function fits a logistic regression model via glm with the censoring status as the response and using all features as predictors. If a covariate is significantly associated with the censoring status, it suggests that censoring may be informative (dependent) rather than random (non-informative). This methodology is more suitable for low-dimensional datasets where the number of features is relatively small compared to the number of observations.

Only designed for "right" and "left" censoring.

Usage
TaskSurv$dep_cens_prop(rows = NULL, method = "holm", sign_level = 0.05)
Arguments
rows

(integer())
Row indices.

method

(character(1))
Method to adjust p-values for multiple comparisons, see p.adjust.methods. Default is "holm".

sign_level

(numeric(1))
Significance level for each coefficient's p-value from the logistic regression model. Default is 0.05.

Returns

numeric()


Method prop_haz()

Checks if the data satisfy the proportional hazards (PH) assumption using the Grambsch-Therneau test, Grambsch (1994). Uses cox.zph. This method should be used only for low-dimensional datasets where the number of features is relatively small compared to the number of observations.

Only designed for "right" and "left" censoring.

Usage
TaskSurv$prop_haz()
Returns

numeric()
If no errors, the p-value of the global chi-square test. A p-value < 0.05 is an indication of possible PH violation.


Method clone()

The objects of this class are cloneable with this method.

Usage
TaskSurv$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

References

Grambsch, Patricia, Therneau, Terry (1994). “Proportional hazards tests and diagnostics based on weighted residuals.” Biometrika, 81(3), 515–526. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1093/biomet/81.3.515")}, https://doi.org/10.1093/biomet/81.3.515.

See Also

Other Task: TaskDens, mlr_tasks_actg, mlr_tasks_faithful, mlr_tasks_gbcs, mlr_tasks_gbsg, mlr_tasks_grace, mlr_tasks_lung, mlr_tasks_mgus, mlr_tasks_pbc, mlr_tasks_precip, mlr_tasks_rats, mlr_tasks_veteran, mlr_tasks_whas

Examples

library(mlr3)
task = tsk("lung")

# meta data
task$target_names # target is always (time, status) for right-censoring tasks
task$feature_names
task$formula()

# survival data
task$truth() # survival::Surv() object
task$times() # (unsorted) times
task$status() # event indicators (1 = death, 0 = censored)
task$unique_times() # sorted unique times
task$unique_event_times() # sorted unique event times
task$risk_set(time = 700) # observation ids that are not censored or dead at t = 700
task$kaplan(strata = "sex") # stratified Kaplan-Meier
task$kaplan(reverse = TRUE) # Kaplan-Meier of the censoring distribution

# proportion of censored observations across all dataset
task$cens_prop()
# proportion of censored observations at or after the 95% time quantile
task$admin_cens_prop(quantile_prob = 0.95)
# proportion of variables that are significantly associated with the
# censoring status via a logistic regression model
task$dep_cens_prop() # 0 indicates independent censoring
# data barely satisfies proportional hazards assumption (p > 0.05)
task$prop_haz()
# veteran data is definitely non-PH (p << 0.05)
tsk("veteran")$prop_haz()

mlr-org/mlr3proba documentation built on April 12, 2025, 4:38 p.m.