fit_GCOMP: Fit sequential GCOMP and TMLE for survival

Description Usage Arguments Value See Also Examples

View source: R/tmle.R

Description

Interventions on up to 3 nodes are allowed: CENS, TRT and MONITOR. TMLE adjustment will be based on the inverse of the propensity score fits for the observed likelihood (g0.C, g0.A, g0.N), multiplied by the indicator of not being censored and the probability of each intervention in intervened_TRT and intervened_MONITOR. Requires column name(s) that specify the counterfactual node values or the counterfactual probabilities of each node being 1 (for stochastic interventions).

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
fit_GCOMP(
  OData,
  tvals,
  Qforms,
  intervened_TRT = NULL,
  intervened_MONITOR = NULL,
  rule_name = paste0(c(intervened_TRT, intervened_MONITOR), collapse = ""),
  models = NULL,
  fit_method = stremrOptions("fit_method"),
  fold_column = stremrOptions("fold_column"),
  TMLE = FALSE,
  stratifyQ_by_rule = FALSE,
  stratify_by_last = TRUE,
  Qstratify = NULL,
  useonly_t_TRT = NULL,
  useonly_t_MONITOR = NULL,
  iterTMLE = FALSE,
  CVTMLE = FALSE,
  byfold_Q = FALSE,
  IPWeights = NULL,
  trunc_weights = 10^6,
  weights = NULL,
  max_iter = 15,
  adapt_stop = TRUE,
  adapt_stop_factor = 10,
  tol_eps = 0.001,
  parallel = FALSE,
  return_wts = FALSE,
  return_fW = FALSE,
  reg_Q = NULL,
  intervened_type_TRT = NULL,
  intervened_type_MONITOR = NULL,
  maxpY = 1,
  TMLE_updater = "TMLE.updater.speedglm",
  verbose = getOption("stremr.verbose"),
  ...
)

Arguments

OData

Input data object created by importData function.

tvals

Vector of time-points in the data for which the survival function (and risk) should be estimated

Qforms

Regression formulas, one formula per Q. Only main-terms are allowed.

intervened_TRT

Column name in the input data with the probabilities (or indicators) of counterfactual treatment nodes being equal to 1 at each time point. Leave the argument unspecified (NULL) when not intervening on treatment node(s).

intervened_MONITOR

Column name in the input data with probabilities (or indicators) of counterfactual monitoring nodes being equal to 1 at each time point. Leave the argument unspecified (NULL) when not intervening on the monitoring node(s).

rule_name

Optional name for the treatment/monitoring regimen.

models

Optional parameters specifying the models for fitting the iterative (sequential) G-Computation formula. Must be an object of class ModelStack specified with gridisl::defModel function.

fit_method

Model selection approach. Can be either "none" - no model selection or "cv" - discrete Super Learner using V fold cross-validation that selects the best model according to lowest cross-validated MSE (must specify the column name that contains the fold IDs) or "origamiSL" - continuous Super Learner that uses the origami R package to select the convex combination of the model predictions (aka model stacking).

fold_column

The column name in the input data (ordered factor) that contains the fold IDs to be used as part of the validation sample. Use the provided function define_CVfolds to define such folds or define the folds using your own method.

TMLE

Set to TRUE to run the usual longitudinal TMLE algorithm (with a separate TMLE update of Q for every sequential regression).

stratifyQ_by_rule

Set to TRUE for stratifying the fit of Q (the outcome model) by rule-followers only. There are two ways to do this stratification. The first option is to use stratify_by_last=TRUE (default), which would fit the outcome model only among the observations that were receiving their supposed counterfactual treatment at the current time-point (ignoring the past history of treatments leading up to time-point t). The second option is to set stratify_by_last=FALSE in which case the outcome model will be fit only among the observations who followed their counterfactual treatment regimen throughout the entire treatment history up to current time-point t (rule followers). For the latter option, the observation would be considered a non-follower if the person's treatment did not match their supposed counterfactual treatment at any time-point up to and including current time-point t.

stratify_by_last

Only used when stratifyQ_by_rule is TRUE. Set to TRUE for stratification by last time-point, set to FALSE for stratification by all time-points (rule-followers). See stratifyQ_by_rule for more details.

Qstratify

Placeholder for future user-defined model stratification for fitting Qs (CURRENTLY NOT FUNCTIONAL, WILL RESULT IN ERROR).

useonly_t_TRT

Use for intervening only on some subset of observation and time-specific treatment nodes. Should be a character string with a logical expression that defines the subset of intervention observations. For example, using TRT==0 will intervene only at observations with the value of TRT being equal to zero. The expression can contain any variable name that was defined in the input dataset. Leave as NULL when intervening on all observations/time-points.

useonly_t_MONITOR

Same as useonly_t_TRT, but for monitoring nodes.

iterTMLE

Set to TRUE to run the iterative univariate TMLE instead of the usual longitudinal TMLE. When set to TRUE this will also provide the standard sequential Gcomp as party of the output.

CVTMLE

Set to TRUE to run the CV-TMLE algorithm instead of the usual TMLE algorithm. Must set either TMLE=TRUE or iterTMLE=TRUE for this argument to have any effect.

byfold_Q

(ADVANCED USE) Fit iterative means (Q parameter) using "by-fold" (aka "fold-specific" or "split-specific") cross-validation approach. Only works with fit_method="origamiSL".

IPWeights

(Optional) result of calling function getIPWeights for running TMLE (evaluated automatically when missing)

trunc_weights

Specify the numeric weight truncation value. All final weights exceeding the value in trunc_weights will be truncated.

weights

Optional data.table with additional observation- and time-specific weights. Must contain columns ID, t and weight. The column named weight is merged back into the original data according to (ID, t). Not implemented yet.

max_iter

For iterative TMLE only: Integer, set to maximum number of iterations for iterative TMLE algorithm.

adapt_stop

For iterative TMLE only: Choose between two stopping criteria for iterative TMLE, default is TRUE, which will stop the iterative TMLE algorithm in an adaptive way. Specifically, the iterations will stop when the mean estimate of the efficient influence curve is less than or equal to 1 / (adapt_stop_factor*sqrt(N)), where N is the total number of unique subjects in data and adapt_stop_factor is set to 10 by default. When TRUE, the argument tol_eps is ignored and TMLE stops when either max_iter has been reached or this criteria has been satisfied. When FALSE, the stopping criteria is determined by values of max_iter and tol_eps.

adapt_stop_factor

For iterative TMLE only: The adaptive factor to choose the stopping criteria for iterative TMLE when adapt_stop is set to TRUE. Default is 10. TMLE will keep iterative until the mean estimate of the efficient influence curve is less than 1 / (adapt_stop_factor*sqrt(N)) or when the number of iterations is max_iter.

tol_eps

For iterative TMLE only: Numeric error tolerance for the iterative TMLE update. The iterative TMLE algorithm will stop when the absolute value of the TMLE intercept update is below tol_eps

parallel

Set to TRUE to run the sequential G-COMP or TMLE in parallel (uses foreach with dopar and requires a previously defined parallel back-end cluster)

return_wts

Applies only when TMLE = TRUE. Return the data.table with subject-specific IP weights as part of the output. Note: for large datasets setting this to TRUE may lead to extremely large object sizes!

return_fW

When TRUE, will return the object fit for the last Q regression as part of the output table. Can be used for obtaining subject-specific predictions of the counterfactual functional E(Y_d|W_i).

reg_Q

(ADVANCED USE ONLY) Directly specify the Q regressions, separately for each time-point.

intervened_type_TRT

(ADVANCED FUNCTIONALITY) Set to NULL by default, can be characters that are set to either "bin", "shift" or "MSM". Provides support for different types of interventions on TRT (treatment) node (counterfactual treatment node A^*(t)). The default behavior is the same as "bin", which assumes that A^*(t) is binary and is set equal to either 0, 1 or p(t), where 0<=p(t)<=1. Here, p(t) denotes the probability that counterfactual A^*(t) is equal to 1, i.e., P(A^*(t)=1)=p(t) and it can change in time and subject to subject. For "shift", it is assumed that the intervention node A^*(t) is a shift in the value of the continuous treatment A, i.e., A^*(t)=A(t)+delta(t). Finally, for "MSM" it is assumed that we simply want the final intervention density g^*(t) to be set to a constant 1. This has use for static MSMs.

intervened_type_MONITOR

(ADVANCED FUNCTIONALITY) Same as intervened_type_TRT, but for monitoring intervention node (counterfactual monitoring node N^*(t)).

maxpY

Maximum probability that the cumulative incidence of the outcome Y(t) is equal to 1. Useful for upper-bounding the rare-outcomes.

TMLE_updater

Function for performing the TMLE update. Default is the TMLE updater based on speedglm (called "TMLE.updater.speedglm"). Other possible options include "TMLE.updater.glm", "linear.TMLE.updater.speedglm" and "iTMLE.updater.xgb".

verbose

Set to TRUE to print auxiliary messages during model fitting.

...

When models arguments is NOT specified, these additional arguments will be passed on directly to all GridSL modeling functions that are called from this routine, e.g., family = "binomial" can be used to specify the model family. Note that all such arguments must be named.

Value

An output list containing the data.table with survival estimates over time saved as "estimates".

See Also

stremr-package for the general overview of the package.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
options(stremr.verbose = TRUE)
require("data.table")

# ----------------------------------------------------------------------
# Simulated Data
# ----------------------------------------------------------------------
data(OdataNoCENS)
OdataDT <- as.data.table(OdataNoCENS, key=c("ID", "t"))

# define lagged N, first value is always 1 (always monitored at the first time point):
OdataDT[, ("N.tminus1") := shift(get("N"), n = 1L, type = "lag", fill = 1L), by = ID]
OdataDT[, ("TI.tminus1") := shift(get("TI"), n = 1L, type = "lag", fill = 1L), by = ID]

# ----------------------------------------------------------------------
# Define intervention (always treated):
# ----------------------------------------------------------------------
OdataDT[, ("TI.set1") := 1L]
OdataDT[, ("TI.set0") := 0L]

# ----------------------------------------------------------------------
# Import Data
# ----------------------------------------------------------------------
OData <- importData(OdataDT, ID = "ID", t = "t", covars = c("highA1c", "lastNat1", "N.tminus1"),
                    CENS = "C", TRT = "TI", MONITOR = "N", OUTCOME = "Y.tplus1")

# ----------------------------------------------------------------------
# Look at the input data object
# ----------------------------------------------------------------------
print(OData)

# ----------------------------------------------------------------------
# Access the input data
# ----------------------------------------------------------------------
get_data(OData)

# ----------------------------------------------------------------------
# Model the Propensity Scores
# ----------------------------------------------------------------------
gform_CENS <- "C ~ highA1c + lastNat1"
gform_TRT = "TI ~ CVD + highA1c + N.tminus1"
gform_MONITOR <- "N ~ 1"
stratify_CENS <- list(C=c("t < 16", "t == 16"))

# ----------------------------------------------------------------------
# Fit Propensity Scores
# ----------------------------------------------------------------------
OData <- fitPropensity(OData, gform_CENS = gform_CENS,
                        gform_TRT = gform_TRT,
                        gform_MONITOR = gform_MONITOR,
                        stratify_CENS = stratify_CENS)

# ----------------------------------------------------------------------
# IPW Ajusted KM or Saturated MSM
# ----------------------------------------------------------------------
require("magrittr")
AKME.St.1 <- getIPWeights(OData, intervened_TRT = "TI.set1") %>%
             survNPMSM(OData) %$%
             estimates
AKME.St.1

# ----------------------------------------------------------------------
# Bounded IPW
# ----------------------------------------------------------------------
IPW.St.1 <- getIPWeights(OData, intervened_TRT = "TI.set1") %>%
            directIPW(OData)
IPW.St.1[]

# ----------------------------------------------------------------------
# IPW-MSM for hazard
# ----------------------------------------------------------------------
wts.DT.1 <- getIPWeights(OData = OData, intervened_TRT = "TI.set1", rule_name = "TI1")
wts.DT.0 <- getIPWeights(OData = OData, intervened_TRT = "TI.set0", rule_name = "TI0")
survMSM_res <- survMSM(list(wts.DT.1, wts.DT.0), OData, tbreaks = c(1:8,12,16)-1,)
survMSM_res$St

# ----------------------------------------------------------------------
# Sequential G-COMP
# ----------------------------------------------------------------------
t.surv <- c(0:10)
Qforms <- rep.int("Qkplus1 ~ CVD + highA1c + N + lastNat1 + TI + TI.tminus1", (max(t.surv)+1))
params <- gridisl::defModel(estimator = "speedglm__glm")

## Not run: 
gcomp_est <- fit_GCOMP(OData, tvals = t.surv, intervened_TRT = "TI.set1",
                          Qforms = Qforms, models = params, stratifyQ_by_rule = FALSE)
gcomp_est[]

## End(Not run)
# ----------------------------------------------------------------------
# TMLE
# ----------------------------------------------------------------------
## Not run: 
tmle_est <- fit_TMLE(OData, tvals = t.surv, intervened_TRT = "TI.set1",
                    Qforms = Qforms, models = params, stratifyQ_by_rule = TRUE)
tmle_est[]

## End(Not run)

# ----------------------------------------------------------------------
# Running IPW-Adjusted KM with optional user-specified weights:
# ----------------------------------------------------------------------
addedWts_DT <- OdataDT[, c("ID", "t"), with = FALSE]
addedWts_DT[, new.wts := sample.int(10, nrow(OdataDT), replace = TRUE)/10]
survNP_res_addedWts <- survNPMSM(wts.DT.1, OData, weights = addedWts_DT)

# ----------------------------------------------------------------------
# Multivariate Propensity Score Regressions
# ----------------------------------------------------------------------
gform_CENS <- "C + TI + N ~ highA1c + lastNat1"
OData <- fitPropensity(OData, gform_CENS = gform_CENS, gform_TRT = gform_TRT,
                        gform_MONITOR = gform_MONITOR)

# ----------------------------------------------------------------------
# Fitting treatment model with Gradient Boosting machines:
# ----------------------------------------------------------------------
## Not run: 
require("h2o")
h2o::h2o.init(nthreads = -1)
gform_CENS <- "C ~ highA1c + lastNat1"
models_TRT <- sl3::Lrnr_h2o_grid$new(algorithm = "gbm")
OData <- fitPropensity(OData, gform_CENS = gform_CENS,
                        gform_TRT = gform_TRT,
                        models_TRT = models_TRT,
                        gform_MONITOR = gform_MONITOR,
                        stratify_CENS = stratify_CENS)

# Use `H2O-3` distributed implementation of GLM for treatment model estimator:
models_TRT <- sl3::Lrnr_h2o_glm$new(family = "binomial")
OData <- fitPropensity(OData, gform_CENS = gform_CENS,
                        gform_TRT = gform_TRT,
                        models_TRT = models_TRT,
                        gform_MONITOR = gform_MONITOR,
                        stratify_CENS = stratify_CENS)

# Use Deep Neural Nets:
models_TRT <- sl3::Lrnr_h2o_grid$new(algorithm = "deeplearning")
OData <- fitPropensity(OData, gform_CENS = gform_CENS,
                        gform_TRT = gform_TRT,
                        models_TRT = models_TRT,
                        gform_MONITOR = gform_MONITOR,
                        stratify_CENS = stratify_CENS)

## End(Not run)

# ----------------------------------------------------------------------
# Fitting different models with different algorithms
# Fine tuning modeling with optional tuning parameters.
# ----------------------------------------------------------------------
## Not run: 
params_TRT <- sl3::Lrnr_h2o_grid$new(algorithm = "gbm",
                              ntrees = 50,
                              learn_rate = 0.05,
                              sample_rate = 0.8,
                              col_sample_rate = 0.8,
                              balance_classes = TRUE)
params_CENS <- sl3::Lrnr_glm_fast$new()
params_MONITOR <- sl3::Lrnr_glm_fast$new()
OData <- fitPropensity(OData,
            gform_CENS = gform_CENS, stratify_CENS = stratify_CENS, params_CENS = params_CENS,
            gform_TRT = gform_TRT, params_TRT = params_TRT,
            gform_MONITOR = gform_MONITOR, params_MONITOR = params_MONITOR)

## End(Not run)

# ----------------------------------------------------------------------
# Running TMLE based on the previous fit of the propensity scores.
# Also applying Random Forest to estimate the sequential outcome model
# ----------------------------------------------------------------------
## Not run: 
t.surv <- c(0:5)
Qforms <- rep.int("Qkplus1 ~ CVD + highA1c + N + lastNat1 + TI + TI.tminus1", (max(t.surv)+1))
models <- sl3::Lrnr_h2o_grid$new(algorithm = "randomForest",
                           ntrees = 100, learn_rate = 0.05, sample_rate = 0.8,
                           col_sample_rate = 0.8, balance_classes = TRUE)
tmle_est <- fit_TMLE(OData, tvals = t.surv, intervened_TRT = "TI.set1",
            Qforms = Qforms, models = models,
            stratifyQ_by_rule = TRUE)

## End(Not run)

## Not run: 
t.surv <- c(0:5)
Qforms <- rep.int("Qkplus1 ~ CVD + highA1c + N + lastNat1 + TI + TI.tminus1", (max(t.surv)+1))
models <- sl3::Lrnr_h2o_grid$new(algorithm = "randomForest",
                           ntrees = 100, learn_rate = 0.05, sample_rate = 0.8,
                           col_sample_rate = 0.8, balance_classes = TRUE)
tmle_est <- fit_TMLE(OData, tvals = t.surv, intervened_TRT = "TI.set1",
            Qforms = Qforms, models = models,
            stratifyQ_by_rule = FALSE)

## End(Not run)

osofr/stremr documentation built on Jan. 25, 2022, 8:07 a.m.