TaskSurv | R Documentation |
This task specializes mlr3::Task and mlr3::TaskSupervised for possibly-censored survival problems. The target is comprised of survival times and an event indicator. Predefined tasks are stored in mlr3::mlr_tasks.
The task_type
is set to "surv"
.
mlr3::Task
-> mlr3::TaskSupervised
-> TaskSurv
censtype
(character(1)
)
Returns the type of censoring, one of "right"
, "left"
, "counting"
,
"interval"
, "interval2"
or "mstate"
.
Currently, only the "right"
-censoring type is fully supported, the rest
are experimental and the API will change in the future.
mlr3::Task$add_strata()
mlr3::Task$cbind()
mlr3::Task$data()
mlr3::Task$divide()
mlr3::Task$droplevels()
mlr3::Task$filter()
mlr3::Task$format()
mlr3::Task$head()
mlr3::Task$help()
mlr3::Task$levels()
mlr3::Task$missings()
mlr3::Task$print()
mlr3::Task$rbind()
mlr3::Task$rename()
mlr3::Task$select()
mlr3::Task$set_col_roles()
mlr3::Task$set_levels()
mlr3::Task$set_row_roles()
new()
Creates a new instance of this R6 class.
TaskSurv$new( id, backend, time = "time", event = "event", time2, type = c("right", "left", "interval", "counting", "interval2", "mstate"), label = NA_character_ )
id
(character(1)
)
Identifier for the new instance.
backend
(DataBackend)
Either a DataBackend, or any object which is convertible to a DataBackend with as_data_backend()
.
E.g., a data.frame()
will be converted to a DataBackendDataTable.
time
(character(1)
)
Name of the column for event time if data is right censored, otherwise starting time if
interval censored.
event
(character(1)
)
Name of the column giving the event indicator.
If data is right censored then "0"/FALSE
means alive (no event), "1"/TRUE
means dead
(event). If type
is "interval"
then "0" means right censored, "1" means dead (event),
"2" means left censored, and "3" means interval censored. If type
is "interval2"
then
event
is ignored.
time2
(character(1)
)
Name of the column for ending time of the interval for interval censored or
counting process data, otherwise ignored.
type
(character(1)
)
Name of the column giving the type of censoring. Default is 'right' censoring.
label
(character(1)
)
Label for the new instance.
Depending on the censoring type ("type"
), the output of a survival
task's "$target_names"
is a character()
vector with values the names
of the columns given by the above initialization arguments.
Specifically, the output is as follows (and in the specified order):
For type
= "right"
, "left"
or "mstate"
: ("time"
, "event"
)
For type
= "interval"
or "counting"
: ("time"
, "time2"
, "event"
)
For type
= "interval2"
: ("time"
, "time2
)
truth()
True response for specified row_ids
. This is the survival outcome
using the Surv format and depends on the censoring
type. Defaults to all rows with role "use"
.
TaskSurv$truth(rows = NULL)
rows
(integer()
)
Row indices.
survival::Surv()
.
formula()
Creates a formula for survival models with survival::Surv()
on the LHS
(left hand side).
TaskSurv$formula(rhs = NULL, reverse = FALSE)
rhs
If NULL
, RHS (right hand side) is "."
, otherwise RHS is "rhs"
.
reverse
If TRUE
then formula calculated with 1 - status.
stats::formula()
.
times()
Returns the (unsorted) outcome times.
TaskSurv$times(rows = NULL)
rows
(integer()
)
Row indices.
numeric()
status()
Returns the event indicator (aka censoring/survival indicator).
If censtype
is "right"
or "left"
then 1
is event and 0
is censored.
If censtype
is "mstate"
then 0
is censored and all other values are different events.
If censtype
is "interval"
then 0
is right-censored, 1
is event, 2
is left-censored,
3
is interval-censored.
See survival::Surv()
.
TaskSurv$status(rows = NULL)
rows
(integer()
)
Row indices.
integer()
unique_times()
Returns the sorted unique outcome times for "right"
, "left"
and
"mstate"
types of censoring.
TaskSurv$unique_times(rows = NULL)
rows
(integer()
)
Row indices.
numeric()
unique_event_times()
Returns the sorted unique event (or failure) outcome times for "right"
,
"left"
and "mstate"
types of censoring.
TaskSurv$unique_event_times(rows = NULL)
rows
(integer()
)
Row indices.
numeric()
risk_set()
Returns the row_ids
of the observations at risk (not dead or censored
or had other events in case of multi-state tasks) at the specified time
.
Only designed for "right"
, "left"
and "mstate"
types of censoring.
TaskSurv$risk_set(time = NULL)
time
(numeric(1)
)
Time to return risk set for, if NULL
returns all row_ids
.
integer()
kaplan()
Calls survival::survfit()
to calculate the Kaplan-Meier estimator.
TaskSurv$kaplan(strata = NULL, rows = NULL, reverse = FALSE, ...)
strata
(character()
)
Stratification variables to use.
rows
(integer()
)
Subset of row indices.
reverse
(logical()
)
If TRUE
calculates Kaplan-Meier of censoring distribution (1-status). Default FALSE
.
...
(any)
Additional arguments passed down to survival::survfit.formula()
.
survival::survfit.object.
reverse()
Returns the same task with the status variable reversed, i.e., 1 - status.
Only designed for "left"
and "right"
censoring.
TaskSurv$reverse()
TaskSurv.
cens_prop()
Returns the proportion of censoring for this survival task.
By default, this is returned for all observations, otherwise only the
specified ones (rows
).
Only designed for "right"
and "left"
censoring.
TaskSurv$cens_prop(rows = NULL)
rows
(integer()
)
Row indices.
numeric()
admin_cens_prop()
Returns an estimated proportion of administratively censored observations (i.e. censored at or after a user-specified time point). Our main assumption here is that in an administratively censored dataset, the maximum censoring time is likely close to the maximum event time and so we expect higher proportion of censored subjects near the study end date.
Only designed for "right"
and "left"
censoring.
TaskSurv$admin_cens_prop(rows = NULL, admin_time = NULL, quantile_prob = 0.99)
rows
(integer()
)
Row indices.
admin_time
(numeric(1)
)
Administrative censoring time (in case it is known a priori).
quantile_prob
(numeric(1)
)
Quantile probability value with which we calculate the cutoff time for
administrative censoring. Ignored, if admin_time
is given.
By default, quantile_prob
is equal to 0.99
, which translates to a
time point very close to the maximum outcome time in the dataset.
A lower value will result in an earlier time point and therefore in a more
relaxed definition (i.e. higher proportion) of administrative censoring.
numeric()
dep_cens_prop()
Returns the proportion of covariates (task features) that are found to be significantly associated with censoring. This function fits a logistic regression model via glm with the censoring status as the response and using all features as predictors. If a covariate is significantly associated with the censoring status, it suggests that censoring may be informative (dependent) rather than random (non-informative). This methodology is more suitable for low-dimensional datasets where the number of features is relatively small compared to the number of observations.
Only designed for "right"
and "left"
censoring.
TaskSurv$dep_cens_prop(rows = NULL, method = "holm", sign_level = 0.05)
rows
(integer()
)
Row indices.
method
(character(1)
)
Method to adjust p-values for multiple comparisons, see p.adjust.methods.
Default is "holm"
.
sign_level
(numeric(1)
)
Significance level for each coefficient's p-value from the logistic
regression model. Default is 0.05
.
numeric()
prop_haz()
Checks if the data satisfy the proportional hazards (PH) assumption using the Grambsch-Therneau test, Grambsch (1994). Uses cox.zph. This method should be used only for low-dimensional datasets where the number of features is relatively small compared to the number of observations.
Only designed for "right"
and "left"
censoring.
TaskSurv$prop_haz()
numeric()
If no errors, the p-value of the global chi-square test.
A p-value < 0.05
is an indication of possible PH violation.
clone()
The objects of this class are cloneable with this method.
TaskSurv$clone(deep = FALSE)
deep
Whether to make a deep clone.
Grambsch, Patricia, Therneau, Terry (1994). “Proportional hazards tests and diagnostics based on weighted residuals.” Biometrika, 81(3), 515–526. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1093/biomet/81.3.515")}, https://doi.org/10.1093/biomet/81.3.515.
Other Task:
TaskDens
,
mlr_tasks_actg
,
mlr_tasks_faithful
,
mlr_tasks_gbcs
,
mlr_tasks_gbsg
,
mlr_tasks_grace
,
mlr_tasks_lung
,
mlr_tasks_mgus
,
mlr_tasks_pbc
,
mlr_tasks_precip
,
mlr_tasks_rats
,
mlr_tasks_veteran
,
mlr_tasks_whas
library(mlr3)
task = tsk("lung")
# meta data
task$target_names # target is always (time, status) for right-censoring tasks
task$feature_names
task$formula()
# survival data
task$truth() # survival::Surv() object
task$times() # (unsorted) times
task$status() # event indicators (1 = death, 0 = censored)
task$unique_times() # sorted unique times
task$unique_event_times() # sorted unique event times
task$risk_set(time = 700) # observation ids that are not censored or dead at t = 700
task$kaplan(strata = "sex") # stratified Kaplan-Meier
task$kaplan(reverse = TRUE) # Kaplan-Meier of the censoring distribution
# proportion of censored observations across all dataset
task$cens_prop()
# proportion of censored observations at or after the 95% time quantile
task$admin_cens_prop(quantile_prob = 0.95)
# proportion of variables that are significantly associated with the
# censoring status via a logistic regression model
task$dep_cens_prop() # 0 indicates independent censoring
# data barely satisfies proportional hazards assumption (p > 0.05)
task$prop_haz()
# veteran data is definitely non-PH (p << 0.05)
tsk("veteran")$prop_haz()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.