fitPropensity: Define and fit propensity score models.

Description Usage Arguments Value Examples

View source: R/main_estimation.R

Description

Defines and fits estimators for the propensity scores, separately for censoring, treatment and monitoring events. When there is right-censoring and/or not intervening on monitoring, only the propensity score model for treatment will be estimated.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
fitPropensity(
  OData,
  gform_CENS,
  gform_TRT,
  gform_MONITOR,
  stratify_CENS = NULL,
  stratify_TRT = NULL,
  stratify_MONITOR = NULL,
  models_CENS = NULL,
  models_TRT = NULL,
  models_MONITOR = NULL,
  fit_method = stremrOptions("fit_method"),
  fold_column = stremrOptions("fold_column"),
  reg_CENS,
  reg_TRT,
  reg_MONITOR,
  use_weights = FALSE,
  verbose = getOption("stremr.verbose"),
  ...
)

Arguments

OData

Input data object created by importData function.

gform_CENS

Specify the regression formula for the right-censoring mechanism, in the format "CensVar1 + CensVar2 ~ Predictor1 + Predictor2". Leave as missing for data with no right-censoring.

gform_TRT

Specify the regression formula for the treatment mechanism, in the format "TRTVar1 + TRTVar2 ~ Predictor1 + Predictor2".

gform_MONITOR

Specify the regression formula for the treatment mechanism, in the format "TRTVar1 + TRTVar2 ~ Predictor1 + Predictor2". Leave as missing for data with no monitoring events or when not intervening on monitoring.

stratify_CENS

Define strata(s) for each censoring variable from gform_CENS. Must be named list containing the logical expressions (the logical expressions must be provided as character strings). When missing (default), all censoring models will be fit by pooling all available observations, across all time-points. When used the censoring models in gform_CENS will be trained separately on each strata (defined by separate logical expressions). If the gform_CENS contains more than one censoring variable then this argumement (stratify_CENS) must provide separate stratas for each censoring variable or be left as missing. For example, when gform_CENS="CensVar1 + CensVar2 ~ Predictor1 + Predictor2", this argument should be a list of length two, with list items named as "CensVar1" and "CensVar2". The expressions in stratify_CENS[["CensVar1"]] define the training stratas for censoring variable CensVar1, while the expressions in stratify_CENS[["CensVar1"]] define the training stratas for CensVar2. See additional examples below.

stratify_TRT

Define strata(s) for treatment model(s). Must be a list of logical expressions (input the expression as character strings). When missing (default), the treatment model(s) are fit by pooling all available (uncensored) observations, across all time-points. The rules are the same as for stratify_CENS.

stratify_MONITOR

Define strata(s) for monitoring model(s). Must be a list of logical expressions (input the expression as character strings). When missing (default), the monitoring model is fit by pooling all available (uncensored) observations, across all time-points. The rules are the same as for stratify_CENS.

models_CENS

Optional parameter specifying the models for fitting the censoring mechanism(s) with gridisl R package. Must be an object of class ModelStack specified with gridisl::defModel function.

models_TRT

Optional parameter specifying the models for fitting the treatment (exposure) mechanism(s) with gridisl R package. Must be an object of class ModelStack specified with gridisl::defModel function.

models_MONITOR

Optional parameter specifying the models for fitting the monitoring mechanism with gridisl R package. Must be an object of class ModelStack specified with gridisl::defModel function.

fit_method

Model selection approach. Can be "none" - no model selection, "cv" - V fold cross-validation that selects the best model according to lowest cross-validated MSE (must specify the column name that contains the fold IDs).

fold_column

The column name in the input data (ordered factor) that contains the fold IDs to be used as part of the validation sample. Use the provided function define_CVfolds to define such folds or define the folds using your own method.

reg_CENS

(ADVANCED FEATURE). Manually define and input the regression specification for each strata of censoring model, using the function define_single_regression.

reg_TRT

(ADVANCED FEATURE). Manually define and input the regression specification for each strata of treatment model, using the function define_single_regression.

reg_MONITOR

(ADVANCED FEATURE). Manually define and input the regression specification for each strata of monitoring model, using the function define_single_regression.

use_weights

(NOT IMPLEMENTED) Set to TRUE to pass the previously specified weights column to all learners for all of the propensity score models. This will result in a weights regression models being fit for P(A(t)|L(t)), P(C(t)|L(t)), P(N(t)|L(t)). (NOTE: This will only work when using sl3 learners, such as the default sl3 learner sl3::Lrnr_glm_fast).

verbose

Set to TRUE to print messages on status and information to the console. Turn this on by default using options(stremr.verbose=TRUE).

...

When all or some of the models_... arguments are NOT specified, these additional arguments will be passed on directly to all gridisl modeling functions that are called from this routine, e.g., family = "binomial" can be used to specify the model family. Note that all such arguments must be named.

Value

...

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
options(stremr.verbose = TRUE)
require("data.table")

# ----------------------------------------------------------------------
# Simulated Data
# ----------------------------------------------------------------------
data(OdataNoCENS)
OdataDT <- as.data.table(OdataNoCENS, key=c("ID", "t"))

# define lagged N, first value is always 1 (always monitored at the first time point):
OdataDT[, ("N.tminus1") := shift(get("N"), n = 1L, type = "lag", fill = 1L), by = ID]
OdataDT[, ("TI.tminus1") := shift(get("TI"), n = 1L, type = "lag", fill = 1L), by = ID]

# ----------------------------------------------------------------------
# Define intervention (always treated):
# ----------------------------------------------------------------------
OdataDT[, ("TI.set1") := 1L]
OdataDT[, ("TI.set0") := 0L]

# ----------------------------------------------------------------------
# Import Data
# ----------------------------------------------------------------------
OData <- importData(OdataDT, ID = "ID", t = "t", covars = c("highA1c", "lastNat1", "N.tminus1"),
                    CENS = "C", TRT = "TI", MONITOR = "N", OUTCOME = "Y.tplus1")

# ----------------------------------------------------------------------
# Look at the input data object
# ----------------------------------------------------------------------
print(OData)

# ----------------------------------------------------------------------
# Access the input data
# ----------------------------------------------------------------------
get_data(OData)

# ----------------------------------------------------------------------
# Model the Propensity Scores
# ----------------------------------------------------------------------
gform_CENS <- "C ~ highA1c + lastNat1"
gform_TRT = "TI ~ CVD + highA1c + N.tminus1"
gform_MONITOR <- "N ~ 1"
stratify_CENS <- list(C=c("t < 16", "t == 16"))

# ----------------------------------------------------------------------
# Fit Propensity Scores
# ----------------------------------------------------------------------
OData <- fitPropensity(OData, gform_CENS = gform_CENS,
                        gform_TRT = gform_TRT,
                        gform_MONITOR = gform_MONITOR,
                        stratify_CENS = stratify_CENS)

# ----------------------------------------------------------------------
# IPW Ajusted KM or Saturated MSM
# ----------------------------------------------------------------------
require("magrittr")
AKME.St.1 <- getIPWeights(OData, intervened_TRT = "TI.set1") %>%
             survNPMSM(OData) %$%
             estimates
AKME.St.1

# ----------------------------------------------------------------------
# Bounded IPW
# ----------------------------------------------------------------------
IPW.St.1 <- getIPWeights(OData, intervened_TRT = "TI.set1") %>%
            directIPW(OData)
IPW.St.1[]

# ----------------------------------------------------------------------
# IPW-MSM for hazard
# ----------------------------------------------------------------------
wts.DT.1 <- getIPWeights(OData = OData, intervened_TRT = "TI.set1", rule_name = "TI1")
wts.DT.0 <- getIPWeights(OData = OData, intervened_TRT = "TI.set0", rule_name = "TI0")
survMSM_res <- survMSM(list(wts.DT.1, wts.DT.0), OData, tbreaks = c(1:8,12,16)-1,)
survMSM_res$St

# ----------------------------------------------------------------------
# Sequential G-COMP
# ----------------------------------------------------------------------
t.surv <- c(0:10)
Qforms <- rep.int("Qkplus1 ~ CVD + highA1c + N + lastNat1 + TI + TI.tminus1", (max(t.surv)+1))
params <- gridisl::defModel(estimator = "speedglm__glm")

## Not run: 
gcomp_est <- fit_GCOMP(OData, tvals = t.surv, intervened_TRT = "TI.set1",
                          Qforms = Qforms, models = params, stratifyQ_by_rule = FALSE)
gcomp_est[]

## End(Not run)
# ----------------------------------------------------------------------
# TMLE
# ----------------------------------------------------------------------
## Not run: 
tmle_est <- fit_TMLE(OData, tvals = t.surv, intervened_TRT = "TI.set1",
                    Qforms = Qforms, models = params, stratifyQ_by_rule = TRUE)
tmle_est[]

## End(Not run)

# ----------------------------------------------------------------------
# Running IPW-Adjusted KM with optional user-specified weights:
# ----------------------------------------------------------------------
addedWts_DT <- OdataDT[, c("ID", "t"), with = FALSE]
addedWts_DT[, new.wts := sample.int(10, nrow(OdataDT), replace = TRUE)/10]
survNP_res_addedWts <- survNPMSM(wts.DT.1, OData, weights = addedWts_DT)

# ----------------------------------------------------------------------
# Multivariate Propensity Score Regressions
# ----------------------------------------------------------------------
gform_CENS <- "C + TI + N ~ highA1c + lastNat1"
OData <- fitPropensity(OData, gform_CENS = gform_CENS, gform_TRT = gform_TRT,
                        gform_MONITOR = gform_MONITOR)

# ----------------------------------------------------------------------
# Fitting treatment model with Gradient Boosting machines:
# ----------------------------------------------------------------------
## Not run: 
require("h2o")
h2o::h2o.init(nthreads = -1)
gform_CENS <- "C ~ highA1c + lastNat1"
models_TRT <- sl3::Lrnr_h2o_grid$new(algorithm = "gbm")
OData <- fitPropensity(OData, gform_CENS = gform_CENS,
                        gform_TRT = gform_TRT,
                        models_TRT = models_TRT,
                        gform_MONITOR = gform_MONITOR,
                        stratify_CENS = stratify_CENS)

# Use `H2O-3` distributed implementation of GLM for treatment model estimator:
models_TRT <- sl3::Lrnr_h2o_glm$new(family = "binomial")
OData <- fitPropensity(OData, gform_CENS = gform_CENS,
                        gform_TRT = gform_TRT,
                        models_TRT = models_TRT,
                        gform_MONITOR = gform_MONITOR,
                        stratify_CENS = stratify_CENS)

# Use Deep Neural Nets:
models_TRT <- sl3::Lrnr_h2o_grid$new(algorithm = "deeplearning")
OData <- fitPropensity(OData, gform_CENS = gform_CENS,
                        gform_TRT = gform_TRT,
                        models_TRT = models_TRT,
                        gform_MONITOR = gform_MONITOR,
                        stratify_CENS = stratify_CENS)

## End(Not run)

# ----------------------------------------------------------------------
# Fitting different models with different algorithms
# Fine tuning modeling with optional tuning parameters.
# ----------------------------------------------------------------------
## Not run: 
params_TRT <- sl3::Lrnr_h2o_grid$new(algorithm = "gbm",
                              ntrees = 50,
                              learn_rate = 0.05,
                              sample_rate = 0.8,
                              col_sample_rate = 0.8,
                              balance_classes = TRUE)
params_CENS <- sl3::Lrnr_glm_fast$new()
params_MONITOR <- sl3::Lrnr_glm_fast$new()
OData <- fitPropensity(OData,
            gform_CENS = gform_CENS, stratify_CENS = stratify_CENS, params_CENS = params_CENS,
            gform_TRT = gform_TRT, params_TRT = params_TRT,
            gform_MONITOR = gform_MONITOR, params_MONITOR = params_MONITOR)

## End(Not run)

# ----------------------------------------------------------------------
# Running TMLE based on the previous fit of the propensity scores.
# Also applying Random Forest to estimate the sequential outcome model
# ----------------------------------------------------------------------
## Not run: 
t.surv <- c(0:5)
Qforms <- rep.int("Qkplus1 ~ CVD + highA1c + N + lastNat1 + TI + TI.tminus1", (max(t.surv)+1))
models <- sl3::Lrnr_h2o_grid$new(algorithm = "randomForest",
                           ntrees = 100, learn_rate = 0.05, sample_rate = 0.8,
                           col_sample_rate = 0.8, balance_classes = TRUE)
tmle_est <- fit_TMLE(OData, tvals = t.surv, intervened_TRT = "TI.set1",
            Qforms = Qforms, models = models,
            stratifyQ_by_rule = TRUE)

## End(Not run)

## Not run: 
t.surv <- c(0:5)
Qforms <- rep.int("Qkplus1 ~ CVD + highA1c + N + lastNat1 + TI + TI.tminus1", (max(t.surv)+1))
models <- sl3::Lrnr_h2o_grid$new(algorithm = "randomForest",
                           ntrees = 100, learn_rate = 0.05, sample_rate = 0.8,
                           col_sample_rate = 0.8, balance_classes = TRUE)
tmle_est <- fit_TMLE(OData, tvals = t.surv, intervened_TRT = "TI.set1",
            Qforms = Qforms, models = models,
            stratifyQ_by_rule = FALSE)

## End(Not run)

osofr/stremr documentation built on Jan. 25, 2022, 8:07 a.m.