drtmle: TMLE estimate of the average treatment effect with...
In benkeser/drtmle: Doubly-Robust Nonparametric Estimation and Inference

drtmle

R Documentation

TMLE estimate of the average treatment effect with doubly-robust inference

Description

TMLE estimate of the average treatment effect with doubly-robust inference

Usage

drtmle(Y, A, W, DeltaA = as.numeric(!is.na(A)),
  DeltaY = as.numeric(!is.na(Y)), a_0 = unique(A[!is.na(A)]), family = if
  (all(Y %in% c(0, 1))) {     stats::binomial() } else {    
  stats::gaussian() }, stratify = FALSE, SL_Q = NULL, SL_g = NULL,
  SL_Qr = NULL, SL_gr = NULL, n_SL = 1, avg_over = "drtmle",
  se_cv = "none", se_cvFolds = ifelse(se_cv == "partial", 10, 1),
  targeted_se = se_cv != "partial", glm_Q = NULL, glm_g = NULL,
  glm_Qr = NULL, glm_gr = NULL, adapt_g = FALSE, guard = c("Q", "g"),
  reduction = "univariate", returnModels = FALSE, returnNuisance = TRUE,
  cvFolds = 1, maxIter = 3, tolIC = 1/length(Y), tolg = 0.01,
  verbose = FALSE, Qsteps = 2, Qn = NULL, gn = NULL,
  use_future = FALSE, ...)

Arguments

`Y`	A `numeric` continuous or binary outcomes.
`A`	A `numeric` vector of discrete-valued treatment assignment.
`W`	A `data.frame` of named covariates.
`DeltaA`	A `numeric` vector of missing treatment indicator (assumed to be equal to 0 if missing 1 if observed).
`DeltaY`	A `numeric` vector of missing outcome indicator (assumed to be equal to 0 if missing 1 if observed).
`a_0`	A `numeric` vector of fixed treatment values at which to return marginal mean estimates.
`family`	A `family` object equal to either `binomial()` or `gaussian()`, to be passed to the `SuperLearner` or `glm` function.
`stratify`	A `boolean` indicating whether to estimate the outcome regression separately for different values of `A` (if `TRUE`) or to pool across `A` (if `FALSE`).
`SL_Q`	A vector of characters or a list describing the Super Learner library to be used for the outcome regression. See `SuperLearner` for details.
`SL_g`	A vector of characters describing the super learner library to be used for each of the propensity score regressions (`DeltaA`, `A`, and `DeltaY`). To use the same library for each of the regressions (or if there is no missing data in `A` nor `Y`), a single library may be input. See `SuperLearner` for details on how super learner libraries can be specified.
`SL_Qr`	A vector of characters or a list describing the Super Learner library to be used for the reduced-dimension outcome regression.
`SL_gr`	A vector of characters or a list describing the Super Learner library to be used for the reduced-dimension propensity score.
`n_SL`	Number of repeated Super Learners to run (default 1) for the each nuisance parameter. Repeat Super Learners more times to obtain more stable inference.
`avg_over`	If multiple Super Learners are run, on which scale should the results be aggregated. Options include: `"SL"` = repeated nuisance parameter estimates are averaged before subsequently generating a single vector of point estimates based on the averaged models; `"drtmle"` = repeated vectors of point estimates are generated and averaged. Both can be specified, recognizing that this adds considerable computational expense. In this case, the final estimates are the average of `n_SL` point estimates where each is built by averaging `n_SL` fits. If `NULL`, no averaging is performed (in which case `n_SL` should be set equal to 1).
`se_cv`	Should cross-validated nuisance parameter estimates be used for computing standard errors? Options are `"none"` = no cross-validation is performed; `"partial"` = only applicable if Super Learner is used for nuisance parameter estimates; `"full"` = full cross-validation is performed. See vignette for further details. Ignored if `cvFolds > 1`, since then cross-validated nuisance parameter estimates are used by default and it is assumed that you want full cross-validated standard errors.
`se_cvFolds`	If cross-validated nuisance parameter estimates are used to compute standard errors, how many folds should be used in this computation. If `se_cv = "partial"`, then this option sets the number of folds used by the `SuperLearner` fitting procedure.
`targeted_se`	A boolean indicating whether the targeted nuisance parameters should be used in standard error computation or the initial estimators. If `se_cv` is not set to `"none"`, this option is ignored and standard errors are computed based on non-targeted, cross-validated nuisance parameter fits.
`glm_Q`	A character describing a formula to be used in the call to `glm` for the outcome regression. Ignored if `SL_Q!=NULL`.
`glm_g`	A list of characters describing the formulas to be used for each of the propensity score regressions (`DeltaA`, `A`, and `DeltaY`). To use the same formula for each of the regressions (or if there are no missing data in `A` nor `Y`), a single character formula may be input. In general the formulas can reference any variable in `colnames(W)`, unless `adapt_g = TRUE` in which case the formulas should reference variables `QaW` where `a` takes values in `a_0`.
`glm_Qr`	A character describing a formula to be used in the call to `glm` for reduced-dimension outcome regression. Ignored if `SL_Qr!=NULL`. The formula should use the variable name `'gn'`.
`glm_gr`	A character describing a formula to be used in the call to `glm` for the reduced-dimension propensity score. Ignored if `SL_gr!=NULL`. The formula should use the variable name `'Qn'` and `'gn'` if `reduction='bivariate'` and `'Qn'` otherwise.
`adapt_g`	A boolean indicating whether the propensity score should be outcome adaptive. If `TRUE` then the propensity score is estimated as the regression of `A` onto covariates `QaW` for `a` in each value contained in `a_0`. See vignette for more details.
`guard`	A character vector indicating what pattern of misspecifications to guard against. If `guard` contains `"Q"`, then the TMLE guards against misspecification of the outcome regression by estimating the reduced-dimension outcome regression specified by `glm_Qr` or `SL_Qr`. If `guard` contains `"g"` then the TMLE (additionally) guards against misspecification of the propensity score by estimating the reduced-dimension propensity score specified by `glm_gr` or `SL_gr`. If `guard` is set to `NULL`, then only standard TMLE and one-step estimators are computed.
`reduction`	A character equal to `"univariate"` for a univariate misspecification correction (default) or `"bivariate"` for the bivariate version.
`returnModels`	A boolean indicating whether to return model fits for the outcome regression, propensity score, and reduced-dimension regressions.
`returnNuisance`	A boolean indicating whether to return the estimated nuisance regressions evaluated on the observed data. Defaults to `TRUE`. If `n_SL` is large and `"drtmle"` is in `avg_over`, then consider setting to `FALSE` in order to reduce size of resultant object.
`cvFolds`	A numeric equal to the number of folds to be used in cross-validated fitting of nuisance parameters. If `cvFolds = 1`, no cross-validation is used. Alternatively, `cvFolds` may be entered as a vector of fold assignments for observations, in which case its length should be the same length as `Y`.
`maxIter`	A numeric that sets the maximum number of iterations the TMLE can perform in its fluctuation step.
`tolIC`	A numeric that defines the stopping criteria based on the empirical mean of the influence function.
`tolg`	A numeric indicating the minimum value for estimates of the propensity score.
`verbose`	A boolean indicating whether to print status updates.
`Qsteps`	A numeric equal to 1 or 2 indicating whether the fluctuation submodel for the outcome regression should be fit using a single minimization (`Qsteps = 1`) or a backfitting-type minimization (`Qsteps=2`). The latter was found to be more stable in simulations and is the default.
`Qn`	An optional list of outcome regression estimates. If specified, the function will ignore the nuisance parameter estimation specified by `SL_Q` and `glm_Q`. The entries in the list should correspond to the outcome regression evaluated at `A` and the observed values of `W`, with order determined by the input to `a_0` (e.g., if `a_0 = c(0, 1)` then `Qn[[1]]` should be outcome regression at `A` = 0 and `Qn[[2]]` should be outcome regression at `A` = 1).
`gn`	An optional list of propensity score estimates. If specified, the function will ignore the nuisance parameter estimation specified by `SL_g` and `glm_g`. The entries in the list should correspond to the propensity for the observed values of `W`, with order determined by the input to `a_0` (e.g., if `a_0 = c(0,1)` then `gn[[1]]` should be propensity of `A` = 0 and `gn[[2]]` should be propensity of `A` = 1).
`use_future`	Boolean indicating whether to use `future_lapply` or instead to just use lapply. The latter can be easier to run down errors.
`...`	Other options (not currently used).

Value

An object of class "drtmle".

drtmle: A list of doubly-robust point estimates and a doubly-robust covariance matrix
nuisance_drtmle: A list of the final TMLE estimates of the outcome regression ($QnStar), propensity score ($gnStar), and reduced-dimension regressions ($QrnStar, $grnStar) evaluated at the observed data values.
ic_drtmle: A list of the empirical mean of the efficient influence function ($eif) and the extra pieces of the influence function resulting from misspecification. All should be smaller than tolIC (unless maxIter was reached first). Also includes a matrix of the influence function values at the estimated nuisance parameters evaluated at the observed data.
aiptw_c: A list of doubly-robust point estimates and a non-doubly-robust covariance matrix. Theory does not guarantee performance of inference for these estimators, but simulation studies showed they often perform adequately.
nuisance_aiptw: A list of the initial estimates of the outcome regression, propensity score, and reduced-dimension regressions evaluated at the observed data values.
tmle: A list of doubly-robust point estimates and non-doubly-robust covariance for the standard TMLE estimator.
aiptw: A list of doubly-robust point estimates and non-doubly-robust covariance matrix for the standard AIPTW estimator.
gcomp: A list of non-doubly-robust point estimates and non-doubly-robust covariance matrix for the standard G-computation estimator. If super learner is used there is no guarantee of correct inference for this estimator.
QnMod: The fitted object for the outcome regression. Returns NULL if returnModels = FALSE.
gnMod: The fitted object for the propensity score. Returns NULL if returnModels = FALSE.
QrnMod: The fitted object for the reduced-dimension regression that guards against misspecification of the outcome regression. Returns NULL if returnModels = FALSE.
grnMod: The fitted object for the reduced-dimension regression that guards against misspecification of the propensity score. Returns NULL if returnModels = FALSE.
a_0: The treatment levels that were requested for computation of covariate-adjusted means.

Examples

# load super learner
library(SuperLearner)
# simulate data
set.seed(123456)
n <- 100
W <- data.frame(W1 = runif(n), W2 = rnorm(n))
A <- rbinom(n, 1, plogis(W$W1 - W$W2))
Y <- rbinom(n, 1, plogis(W$W1 * W$W2 * A))
# A quick example of drtmle:
# We note that more flexible super learner libraries
# are available, and that we recommend the user use more flexible
# libraries for SL_Qr and SL_gr for general use.
fit1 <- drtmle(
  W = W, A = A, Y = Y, a_0 = c(1, 0),
  family = binomial(),
  stratify = FALSE,
  SL_Q = c("SL.glm", "SL.mean", "SL.glm.interaction"),
  SL_g = c("SL.glm", "SL.mean", "SL.glm.interaction"),
  SL_Qr = "SL.glm",
  SL_gr = "SL.glm", maxIter = 1
)

benkeser/drtmle documentation built on Jan. 6, 2023, 11:40 a.m.