fit_GCOMP: Fit sequential GCOMP and TMLE for survival
In osofr/stremr: Streamlined Estimation for Static, Dynamic and Stochastic Treatment Regimes in Longitudinal Data

Description Usage Arguments Value See Also Examples

Interventions on up to 3 nodes are allowed: CENS, TRT and MONITOR. TMLE adjustment will be based on the inverse of the propensity score fits for the observed likelihood (g0.C, g0.A, g0.N), multiplied by the indicator of not being censored and the probability of each intervention in intervened_TRT and intervened_MONITOR. Requires column name(s) that specify the counterfactual node values or the counterfactual probabilities of each node being 1 (for stochastic interventions).

fit_GCOMP(
  OData,
  tvals,
  Qforms,
  intervened_TRT = NULL,
  intervened_MONITOR = NULL,
  rule_name = paste0(c(intervened_TRT, intervened_MONITOR), collapse = ""),
  models = NULL,
  fit_method = stremrOptions("fit_method"),
  fold_column = stremrOptions("fold_column"),
  TMLE = FALSE,
  stratifyQ_by_rule = FALSE,
  stratify_by_last = TRUE,
  Qstratify = NULL,
  useonly_t_TRT = NULL,
  useonly_t_MONITOR = NULL,
  iterTMLE = FALSE,
  CVTMLE = FALSE,
  byfold_Q = FALSE,
  IPWeights = NULL,
  trunc_weights = 10^6,
  weights = NULL,
  max_iter = 15,
  adapt_stop = TRUE,
  adapt_stop_factor = 10,
  tol_eps = 0.001,
  parallel = FALSE,
  return_wts = FALSE,
  return_fW = FALSE,
  reg_Q = NULL,
  intervened_type_TRT = NULL,
  intervened_type_MONITOR = NULL,
  maxpY = 1,
  TMLE_updater = "TMLE.updater.speedglm",
  verbose = getOption("stremr.verbose"),
  ...
)

`OData`	Input data object created by `importData` function.
`tvals`	Vector of time-points in the data for which the survival function (and risk) should be estimated
`Qforms`	Regression formulas, one formula per Q. Only main-terms are allowed.
`intervened_TRT`	Column name in the input data with the probabilities (or indicators) of counterfactual treatment nodes being equal to 1 at each time point. Leave the argument unspecified (`NULL`) when not intervening on treatment node(s).
`intervened_MONITOR`	Column name in the input data with probabilities (or indicators) of counterfactual monitoring nodes being equal to 1 at each time point. Leave the argument unspecified (`NULL`) when not intervening on the monitoring node(s).
`rule_name`	Optional name for the treatment/monitoring regimen.
`models`	Optional parameters specifying the models for fitting the iterative (sequential) G-Computation formula. Must be an object of class `ModelStack` specified with `gridisl::defModel` function.
`fit_method`	Model selection approach. Can be either `"none"` - no model selection or `"cv"` - discrete Super Learner using V fold cross-validation that selects the best model according to lowest cross-validated MSE (must specify the column name that contains the fold IDs) or `"origamiSL"` - continuous Super Learner that uses the `origami` R package to select the convex combination of the model predictions (aka model stacking).
`fold_column`	The column name in the input data (ordered factor) that contains the fold IDs to be used as part of the validation sample. Use the provided function `define_CVfolds` to define such folds or define the folds using your own method.
`TMLE`	Set to `TRUE` to run the usual longitudinal TMLE algorithm (with a separate TMLE update of Q for every sequential regression).
`stratifyQ_by_rule`	Set to `TRUE` for stratifying the fit of Q (the outcome model) by rule-followers only. There are two ways to do this stratification. The first option is to use `stratify_by_last=TRUE` (default), which would fit the outcome model only among the observations that were receiving their supposed counterfactual treatment at the current time-point (ignoring the past history of treatments leading up to time-point t). The second option is to set `stratify_by_last=FALSE` in which case the outcome model will be fit only among the observations who followed their counterfactual treatment regimen throughout the entire treatment history up to current time-point t (rule followers). For the latter option, the observation would be considered a non-follower if the person's treatment did not match their supposed counterfactual treatment at any time-point up to and including current time-point t.
`stratify_by_last`	Only used when `stratifyQ_by_rule` is `TRUE`. Set to `TRUE` for stratification by last time-point, set to `FALSE` for stratification by all time-points (rule-followers). See `stratifyQ_by_rule` for more details.
`Qstratify`	Placeholder for future user-defined model stratification for fitting Qs (CURRENTLY NOT FUNCTIONAL, WILL RESULT IN ERROR).
`useonly_t_TRT`	Use for intervening only on some subset of observation and time-specific treatment nodes. Should be a character string with a logical expression that defines the subset of intervention observations. For example, using `TRT==0` will intervene only at observations with the value of `TRT` being equal to zero. The expression can contain any variable name that was defined in the input dataset. Leave as `NULL` when intervening on all observations/time-points.
`useonly_t_MONITOR`	Same as `useonly_t_TRT`, but for monitoring nodes.
`iterTMLE`	Set to `TRUE` to run the iterative univariate TMLE instead of the usual longitudinal TMLE. When set to `TRUE` this will also provide the standard sequential Gcomp as party of the output.
`CVTMLE`	Set to `TRUE` to run the CV-TMLE algorithm instead of the usual TMLE algorithm. Must set either `TMLE`=`TRUE` or `iterTMLE`=`TRUE` for this argument to have any effect.
`byfold_Q`	(ADVANCED USE) Fit iterative means (Q parameter) using "by-fold" (aka "fold-specific" or "split-specific") cross-validation approach. Only works with `fit_method`=`"origamiSL"`.
`IPWeights`	(Optional) result of calling function `getIPWeights` for running TMLE (evaluated automatically when missing)
`trunc_weights`	Specify the numeric weight truncation value. All final weights exceeding the value in `trunc_weights` will be truncated.
`weights`	Optional `data.table` with additional observation- and time-specific weights. Must contain columns `ID`, `t` and `weight`. The column named `weight` is merged back into the original data according to (`ID`, `t`). Not implemented yet.
`max_iter`	For iterative TMLE only: Integer, set to maximum number of iterations for iterative TMLE algorithm.
`adapt_stop`	For iterative TMLE only: Choose between two stopping criteria for iterative TMLE, default is `TRUE`, which will stop the iterative TMLE algorithm in an adaptive way. Specifically, the iterations will stop when the mean estimate of the efficient influence curve is less than or equal to 1 / (`adapt_stop_factor`*sqrt(`N`)), where N is the total number of unique subjects in data and `adapt_stop_factor` is set to 10 by default. When `TRUE`, the argument `tol_eps` is ignored and TMLE stops when either `max_iter` has been reached or this criteria has been satisfied. When `FALSE`, the stopping criteria is determined by values of `max_iter` and `tol_eps`.
`adapt_stop_factor`	For iterative TMLE only: The adaptive factor to choose the stopping criteria for iterative TMLE when `adapt_stop` is set to `TRUE`. Default is 10. TMLE will keep iterative until the mean estimate of the efficient influence curve is less than 1 / (`adapt_stop_factor`*sqrt(`N`)) or when the number of iterations is `max_iter`.
`tol_eps`	For iterative TMLE only: Numeric error tolerance for the iterative TMLE update. The iterative TMLE algorithm will stop when the absolute value of the TMLE intercept update is below `tol_eps`
`parallel`	Set to `TRUE` to run the sequential G-COMP or TMLE in parallel (uses `foreach` with `dopar` and requires a previously defined parallel back-end cluster)
`return_wts`	Applies only when `TMLE = TRUE`. Return the data.table with subject-specific IP weights as part of the output. Note: for large datasets setting this to `TRUE` may lead to extremely large object sizes!
`return_fW`	When `TRUE`, will return the object fit for the last Q regression as part of the output table. Can be used for obtaining subject-specific predictions of the counterfactual functional E(Y_d\|W_i).
`reg_Q`	(ADVANCED USE ONLY) Directly specify the Q regressions, separately for each time-point.
`intervened_type_TRT`	(ADVANCED FUNCTIONALITY) Set to `NULL` by default, can be characters that are set to either `"bin"`, `"shift"` or `"MSM"`. Provides support for different types of interventions on `TRT` (treatment) node (counterfactual treatment node `A^(t)`). The default behavior is the same as `"bin"`, which assumes that `A^(t)` is binary and is set equal to either `0`, `1` or `p(t)`, where 0<=`p(t)`<=1. Here, `p(t)` denotes the probability that counterfactual A^(t) is equal to 1, i.e., P(A^(t)=1)=`p(t)` and it can change in time and subject to subject. For `"shift"`, it is assumed that the intervention node `A^(t)` is a shift in the value of the continuous treatment `A`, i.e., `A^(t)`=`A(t)`+delta(t). Finally, for "MSM" it is assumed that we simply want the final intervention density `g^*(t)` to be set to a constant 1. This has use for static MSMs.
`intervened_type_MONITOR`	(ADVANCED FUNCTIONALITY) Same as `intervened_type_TRT`, but for monitoring intervention node (counterfactual monitoring node `N^*(t)`).
`maxpY`	Maximum probability that the cumulative incidence of the outcome Y(t) is equal to 1. Useful for upper-bounding the rare-outcomes.
`TMLE_updater`	Function for performing the TMLE update. Default is the TMLE updater based on speedglm (called `"TMLE.updater.speedglm"`). Other possible options include `"TMLE.updater.glm"`, `"linear.TMLE.updater.speedglm"` and `"iTMLE.updater.xgb"`.
`verbose`	Set to `TRUE` to print auxiliary messages during model fitting.
`...`	When `models` arguments is NOT specified, these additional arguments will be passed on directly to all `GridSL` modeling functions that are called from this routine, e.g., `family = "binomial"` can be used to specify the model family. Note that all such arguments must be named.

An output list containing the data.table with survival estimates over time saved as "estimates".

stremr-package for the general overview of the package.

options(stremr.verbose = TRUE)
require("data.table")

# ----------------------------------------------------------------------
# Simulated Data
# ----------------------------------------------------------------------
data(OdataNoCENS)
OdataDT <- as.data.table(OdataNoCENS, key=c("ID", "t"))

# define lagged N, first value is always 1 (always monitored at the first time point):
OdataDT[, ("N.tminus1") := shift(get("N"), n = 1L, type = "lag", fill = 1L), by = ID]
OdataDT[, ("TI.tminus1") := shift(get("TI"), n = 1L, type = "lag", fill = 1L), by = ID]

# ----------------------------------------------------------------------
# Define intervention (always treated):
# ----------------------------------------------------------------------
OdataDT[, ("TI.set1") := 1L]
OdataDT[, ("TI.set0") := 0L]

# ----------------------------------------------------------------------
# Import Data
# ----------------------------------------------------------------------
OData <- importData(OdataDT, ID = "ID", t = "t", covars = c("highA1c", "lastNat1", "N.tminus1"),
                    CENS = "C", TRT = "TI", MONITOR = "N", OUTCOME = "Y.tplus1")

# ----------------------------------------------------------------------
# Look at the input data object
# ----------------------------------------------------------------------
print(OData)

# ----------------------------------------------------------------------
# Access the input data
# ----------------------------------------------------------------------
get_data(OData)

# ----------------------------------------------------------------------
# Model the Propensity Scores
# ----------------------------------------------------------------------
gform_CENS <- "C ~ highA1c + lastNat1"
gform_TRT = "TI ~ CVD + highA1c + N.tminus1"
gform_MONITOR <- "N ~ 1"
stratify_CENS <- list(C=c("t < 16", "t == 16"))

# ----------------------------------------------------------------------
# Fit Propensity Scores
# ----------------------------------------------------------------------
OData <- fitPropensity(OData, gform_CENS = gform_CENS,
                        gform_TRT = gform_TRT,
                        gform_MONITOR = gform_MONITOR,
                        stratify_CENS = stratify_CENS)

# ----------------------------------------------------------------------
# IPW Ajusted KM or Saturated MSM
# ----------------------------------------------------------------------
require("magrittr")
AKME.St.1 <- getIPWeights(OData, intervened_TRT = "TI.set1") %>%
             survNPMSM(OData) %$%
             estimates
AKME.St.1

# ----------------------------------------------------------------------
# Bounded IPW
# ----------------------------------------------------------------------
IPW.St.1 <- getIPWeights(OData, intervened_TRT = "TI.set1") %>%
            directIPW(OData)
IPW.St.1[]

# ----------------------------------------------------------------------
# IPW-MSM for hazard
# ----------------------------------------------------------------------
wts.DT.1 <- getIPWeights(OData = OData, intervened_TRT = "TI.set1", rule_name = "TI1")
wts.DT.0 <- getIPWeights(OData = OData, intervened_TRT = "TI.set0", rule_name = "TI0")
survMSM_res <- survMSM(list(wts.DT.1, wts.DT.0), OData, tbreaks = c(1:8,12,16)-1,)
survMSM_res$St

# ----------------------------------------------------------------------
# Sequential G-COMP
# ----------------------------------------------------------------------
t.surv <- c(0:10)
Qforms <- rep.int("Qkplus1 ~ CVD + highA1c + N + lastNat1 + TI + TI.tminus1", (max(t.surv)+1))
params <- gridisl::defModel(estimator = "speedglm__glm")

## Not run: 
gcomp_est <- fit_GCOMP(OData, tvals = t.surv, intervened_TRT = "TI.set1",
                          Qforms = Qforms, models = params, stratifyQ_by_rule = FALSE)
gcomp_est[]

## End(Not run)
# ----------------------------------------------------------------------
# TMLE
# ----------------------------------------------------------------------
## Not run: 
tmle_est <- fit_TMLE(OData, tvals = t.surv, intervened_TRT = "TI.set1",
                    Qforms = Qforms, models = params, stratifyQ_by_rule = TRUE)
tmle_est[]

## End(Not run)

# ----------------------------------------------------------------------
# Running IPW-Adjusted KM with optional user-specified weights:
# ----------------------------------------------------------------------
addedWts_DT <- OdataDT[, c("ID", "t"), with = FALSE]
addedWts_DT[, new.wts := sample.int(10, nrow(OdataDT), replace = TRUE)/10]
survNP_res_addedWts <- survNPMSM(wts.DT.1, OData, weights = addedWts_DT)

# ----------------------------------------------------------------------
# Multivariate Propensity Score Regressions
# ----------------------------------------------------------------------
gform_CENS <- "C + TI + N ~ highA1c + lastNat1"
OData <- fitPropensity(OData, gform_CENS = gform_CENS, gform_TRT = gform_TRT,
                        gform_MONITOR = gform_MONITOR)

# ----------------------------------------------------------------------
# Fitting treatment model with Gradient Boosting machines:
# ----------------------------------------------------------------------
## Not run: 
require("h2o")
h2o::h2o.init(nthreads = -1)
gform_CENS <- "C ~ highA1c + lastNat1"
models_TRT <- sl3::Lrnr_h2o_grid$new(algorithm = "gbm")
OData <- fitPropensity(OData, gform_CENS = gform_CENS,
                        gform_TRT = gform_TRT,
                        models_TRT = models_TRT,
                        gform_MONITOR = gform_MONITOR,
                        stratify_CENS = stratify_CENS)

# Use `H2O-3` distributed implementation of GLM for treatment model estimator:
models_TRT <- sl3::Lrnr_h2o_glm$new(family = "binomial")
OData <- fitPropensity(OData, gform_CENS = gform_CENS,
                        gform_TRT = gform_TRT,
                        models_TRT = models_TRT,
                        gform_MONITOR = gform_MONITOR,
                        stratify_CENS = stratify_CENS)

# Use Deep Neural Nets:
models_TRT <- sl3::Lrnr_h2o_grid$new(algorithm = "deeplearning")
OData <- fitPropensity(OData, gform_CENS = gform_CENS,
                        gform_TRT = gform_TRT,
                        models_TRT = models_TRT,
                        gform_MONITOR = gform_MONITOR,
                        stratify_CENS = stratify_CENS)

## End(Not run)

# ----------------------------------------------------------------------
# Fitting different models with different algorithms
# Fine tuning modeling with optional tuning parameters.
# ----------------------------------------------------------------------
## Not run: 
params_TRT <- sl3::Lrnr_h2o_grid$new(algorithm = "gbm",
                              ntrees = 50,
                              learn_rate = 0.05,
                              sample_rate = 0.8,
                              col_sample_rate = 0.8,
                              balance_classes = TRUE)
params_CENS <- sl3::Lrnr_glm_fast$new()
params_MONITOR <- sl3::Lrnr_glm_fast$new()
OData <- fitPropensity(OData,
            gform_CENS = gform_CENS, stratify_CENS = stratify_CENS, params_CENS = params_CENS,
            gform_TRT = gform_TRT, params_TRT = params_TRT,
            gform_MONITOR = gform_MONITOR, params_MONITOR = params_MONITOR)

## End(Not run)

# ----------------------------------------------------------------------
# Running TMLE based on the previous fit of the propensity scores.
# Also applying Random Forest to estimate the sequential outcome model
# ----------------------------------------------------------------------
## Not run: 
t.surv <- c(0:5)
Qforms <- rep.int("Qkplus1 ~ CVD + highA1c + N + lastNat1 + TI + TI.tminus1", (max(t.surv)+1))
models <- sl3::Lrnr_h2o_grid$new(algorithm = "randomForest",
                           ntrees = 100, learn_rate = 0.05, sample_rate = 0.8,
                           col_sample_rate = 0.8, balance_classes = TRUE)
tmle_est <- fit_TMLE(OData, tvals = t.surv, intervened_TRT = "TI.set1",
            Qforms = Qforms, models = models,
            stratifyQ_by_rule = TRUE)

## End(Not run)

## Not run: 
t.surv <- c(0:5)
Qforms <- rep.int("Qkplus1 ~ CVD + highA1c + N + lastNat1 + TI + TI.tminus1", (max(t.surv)+1))
models <- sl3::Lrnr_h2o_grid$new(algorithm = "randomForest",
                           ntrees = 100, learn_rate = 0.05, sample_rate = 0.8,
                           col_sample_rate = 0.8, balance_classes = TRUE)
tmle_est <- fit_TMLE(OData, tvals = t.surv, intervened_TRT = "TI.set1",
            Qforms = Qforms, models = models,
            stratifyQ_by_rule = FALSE)

## End(Not run)

osofr/stremr documentation built on Jan. 25, 2022, 8:07 a.m.

osofr/stremr index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

osofr/stremr
Streamlined Estimation for Static, Dynamic and Stochastic Treatment Regimes in Longitudinal Data

fit_GCOMP: Fit sequential GCOMP and TMLE for survival
In osofr/stremr: Streamlined Estimation for Static, Dynamic and Stochastic Treatment Regimes in Longitudinal Data

Description

Usage

Arguments

Value

See Also

Examples

Related to fit_GCOMP in osofr/stremr...

R Package Documentation

Browse R Packages

We want your feedback!

osofr/stremr Streamlined Estimation for Static, Dynamic and Stochastic Treatment Regimes in Longitudinal Data

fit_GCOMP: Fit sequential GCOMP and TMLE for survival In osofr/stremr: Streamlined Estimation for Static, Dynamic and Stochastic Treatment Regimes in Longitudinal Data

Description

Usage

Arguments

Value

See Also

Examples

Related to fit_GCOMP in osofr/stremr...

R Package Documentation

Browse R Packages

We want your feedback!

osofr/stremr
Streamlined Estimation for Static, Dynamic and Stochastic Treatment Regimes in Longitudinal Data

fit_GCOMP: Fit sequential GCOMP and TMLE for survival
In osofr/stremr: Streamlined Estimation for Static, Dynamic and Stochastic Treatment Regimes in Longitudinal Data