cox_reg: Fit a Cox model and compute predictions of expectations for...

Description Usage Arguments Details Value References See Also Examples

View source: R/cox_reg.R

Description

cox_reg is a benchmark model we use in [Gerber et al. (2018)]. To model the variable phi(T), where T is a right censored time and phi is a given function, we first fit a Cox model to the data to estimate the survival function of T given the covariates. Then, we deduce an estimator of phi(T) by integration of the function phi with respect to the estimated survival function. Different methods are available to assess the quality of fit of cox_reg. cox_reg is a wrapper for the coxph function

The notations we use are :

Usage

1
2
3
4
5
cox_reg(y_var, delta_var, x_vars, train, test = NULL, phi = function(x) {   
   x }, phi.args = list(), max_time = NULL, cox_obj = TRUE,
  ev_methods = c("concordance", "weighted"), bandwidths = NULL,
  types_w_ev = c("KM"), max_w_ev = 1000, mat_w = NULL,
  y_no_cens_var = NULL, ...)

Arguments

y_var

A character string which gives the name of the Y variable

delta_var

A character string which gives the name of the δ variable (this variable should be binary : 1 = non censored, 0 = censored)

x_vars

A vector of character strings which gives the names of the explanatory variables

train

A data.frame of training observations. It must contain the column names y_var, delta_var and x_vars

test

A data.frame of testing observations (default = NULL)

phi

A function to be applied to y_var

phi.args

A list of additional parameters for the function phi (default = NULL). See Examples for a use case

max_time

A real number giving a threshold for y_var (default = NULL). If NULL, then max_time is set to the maximum non censored observation of y_var among the training set. We note T' = min(T, max_time)

cox_obj

A boolean which indicates if the Cox model fitted to the training data should be returned (default = TRUE)

ev_methods

A vector of character strings which gives the methods that should be used for the evaluation of the model (default = c("concordance","weighted")). Possible choices are "concordance", "weighted" and "group". Multiple choices are possible. See Details - Evaluation criteria for more information

bandwidths

A vector of real numbers for the bandwidths to use for the model evaluation if "group" is used as an ev_methods (default = NULL : set to 50 if "group" in ev_methods). Only used if "group" is used as ev_methods. See Details - Evaluation criteria for more information

types_w_ev

A vector of character strings which gives the types of weights to be used for IPCW (Inverse Probability of Censoring Weighting) in the model evaluation (default = c("KM") (Kaplan Meier)). Possible choices are "KM", "Cox", "RSF" and "unif". Set to colnames(mat_w) if mat_w is provided. See Details - Evaluation criteria for more information

max_w_ev

A real number which gives the maximum admissible ratio for the IPC weights (default = 1000). See Details - Evaluation criteria for more information

mat_w

A matrix to provide handmade IPC weights for the model evaluation (default = NULL). mat_w should satisfied nrow(mat_w) = nrow(train) + nrow(test) and a column should correspond to a type of weights (multiple columns are possible). Column names of mat_w may be used to specify names for the provided weights (by default names will be "w1", "w2", ...)

y_no_cens_var

A character string which gives the name of the non censored y_var (default = NULL). To be used only in the context of simulated data where full about is available.

...

Additional parameter that may be pass to the coxph function (package survival)

Details

Value

A list with the following elements :

pred_train

The vector of the predicted values for phi(T') (with T' = min(T, max_time)) for the observations of the train set

pred_test

The vector of the predicted values for phi(T') for the observations of the test set (require test != NULL)

perf_train

The list with the values for the evaluation criteria computed on the train set

perf_test

The list with the values for the evaluation criteria computed on the test set (require test != NULL))

surv_train

The matrix which contains the estimated values of the survival curves at time_points (with the Cox model), for the observations of the train set

surv_test

The matrix which contains the estimated values of the survival curves at time_points, for the observations of the test set (require test != NULL)

time_points

The vector of the time points where the survival curves are evaluated

mat_w_train

The matrix which contains the values of the weights used for the "weighted" criteria, for the observations of the train set

mat_w_test

The matrix which contains the values of the weights used for the "weighted" criteria, for the observations of the test set

n_w_ev_modif_train

The vector giving the number of train weights modified due to max_w_ev

n_w_ev_modif_test

The vector giving the number of test weights modified due to max_w_ev

cox_obj

The obj returned by the coxph function

max_time

The real number giving the threshold used by the model

cens_rate

The real number giving the rate of censoring of T', computed on the concatenation of train and test

train

The data.frame of the train data provided as arguments, plus columns : Y' = min(Y, max_time ), δ' = 1_{T' ≤ C} and phi(T')

test

The data.frame of the test data provided as arguments, plus columns : Y' = min(Y, max_time ), δ' = 1_{T' ≤ C} and phi(T')

phi

See Argument

phi.args

See Argument

x_vars

See Argument

max_w_ev

See Argument

References

Gerber, G., Le Faou, Y., Lopez, O., & Trupin, M. (2018). The impact of churn on prospect value in health insurance, evaluation using a random forest under random censoring. https://hal.archives-ouvertes.fr/hal-01807623/

See Also

coxph, predict_cox_reg

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
# ------------------------------------------------
#   Load "transplant" data
# ------------------------------------------------
data("transplant", package = "survival")
transplant$delta = 1 * (transplant$event == "ltx") # create binary var
# which indicate censoring/non censoring

# keep only rows with no missing value
apply(transplant, MARGIN = 2, FUN = function(x){sum(is.na(x))})
transplant_bis = transplant[stats::complete.cases(transplant),]

# plot the survival curve of transplant data
KM_transplant = survfit(formula = survival::Surv(time = futime, event = delta) ~ 1,
                                  data = transplant_bis)
plot(KM_transplant)

# ------------------------------------------------
#   Basic call to train a model
# ------------------------------------------------

res1 = cox_reg(y_var = "futime",
                      delta_var = "delta",
                      x_vars = setdiff(colnames(transplant_bis),
                                       c("futime", "delta", "event")),
                      train = transplant_bis,
                      types_w_ev = c("KM", "Cox", "RSF", "unif"))

matplot(y = t(res1$surv_train[1:30,]), x = res1$time_points, type = "l")
print(res1$perf_train)
print(res1$max_time) # by default \code{max_time} is set to 2055 which is too large
# to have good predictions (training R2 with different weights are negative !)

# ------------------------------------------------
#   Training with estimation of test error
# ------------------------------------------------
set.seed(17)
train_lines = sample(1:nrow(transplant_bis), 600)
res2 = cox_reg(y_var = "futime",
                      delta_var = "delta",
                      x_vars = setdiff(colnames(transplant_bis),c("futime", "delta", "event")),
                      train = transplant_bis[train_lines,],
                      test = transplant_bis[-train_lines,],
                      types_w_ev = c("KM", "Cox", "RSF", "unif"))

print(res2$max_time) # default \code{max_time} has changed since train set
# is different

# train error is now positive but test error is still negative
print(res2$perf_train)
print(res2$perf_test)

# visualise the predictions
print(res2$pred_test[1:30])
matplot(y = t(res2$surv_test[1:30,]), x = res2$time_points, type = "l")

# ------------------------------------------------
#   Modify the \code{max_time} argument
# ------------------------------------------------
set.seed(17)
train_lines = sample(1:nrow(transplant_bis), 600)
res3 = cox_reg(y_var = "futime",
                      delta_var = "delta",
                      x_vars = setdiff(colnames(transplant_bis),c("futime", "delta", "event")),
                      train = transplant_bis[train_lines,],
                      test = transplant_bis[-train_lines,],
                      max_time = 600,
                      types_w_ev = c("KM", "Cox", "RSF", "unif"))

print(res3$perf_train)
print(res3$perf_test) # test error is much better
print(res3$pred_test[1:30])
matplot(y = t(res3$surv_test[1:30,]), x = res3$time_points, type = "l")

# analyse the weights used for "weighted" criteria
print(res3$cens_rate) # rate of censoring taking into account \code{max_time}
print(head(res3$mat_w_test))
## ratio max(weights)/min(weights)
print(apply(X = res3$mat_w_test,
            MARGIN = 2,
            FUN = function(x){max(x[x != 0])/min(x[x != 0])}))
# ratios are low because the censoring rate is low

# in this case, it is not meaningful to to modify the
# \code{max_w_ev} argument since the maximum ratios
# between weights are around 2 and the test data has 197 rows.
# But in other situation it may be pertinent


# ------------------------------------------------
#   Use custom \code{phi} function
# ------------------------------------------------
g = function(x,a) abs(x-a)
set.seed(17)
train_lines = sample(1:nrow(transplant_bis), 600)
res4 = cox_reg(y_var = "futime",
                      delta_var = "delta",
                      x_vars = setdiff(colnames(transplant_bis),c("futime", "delta", "event")),
                      train = transplant_bis[train_lines,],
                      test = transplant_bis[-train_lines,],
                      phi = g,
                      phi.args = list(a = 200), # set value for "a"
                      max_time = 600,
                      types_w_ev = c("KM", "Cox", "RSF", "unif"))

print(res4$perf_test)
print(res4$pred_test[1:30])

YohannLeFaou/sword documentation built on May 28, 2019, 3:21 p.m.