sg.cvtmle: CV-TMLE Estimating Impact of Treating Optimal Subgroup
In alexluedtke12/sg: Targeted Learning for Subgroup Analyses

View source: R/cvtmle.R

sg.cvtmle

R Documentation

CV-TMLE Estimating Impact of Treating Optimal Subgroup

Description

This function uses a cross-validated targeted minimum loss-based estimator (CV-TMLE) to evaluate the impact of treating the optimal subgroup versus following a user-specified static treatment strategy.

Usage

sg.cvtmle(W, A, Y, SL.library, Delta = rep(1,length(A)), OR.SL.library = SL.library,
  prop.SL.library = SL.library, missingness.SL.library = SL.library, txs = c(0, 1),
  baseline.probs = c(0.5, 0.5), kappa = 1, g0 = NULL, Q0 = NULL,
  family = binomial(), sig.trunc = 1e-10, alpha = 0.05,
  num.folds = 10, num.SL.rep = 5, SL.method = "method.NNLS2",
  num.est.rep = 5, id = NULL, folds = NULL, obsWeights = NULL,
  stratifyCV = FALSE, RR = FALSE, lib.ests = FALSE,
  init.ests.out = FALSE, init.ests.in = NULL, verbose = TRUE, ...)

Arguments

`W`	data frame with observations in the rows and baseline covariates used to form the subgroup in columns.
`A`	numeric treatment vector. Treatments of interest specified using the `txs` argument.
`Y`	real-valued outcome for which large values are preferred (if use relative risk contrast, then Y should be an indicator of the absence of an adverse event, and the relative risk returned is the relative risk of the adverse event).
`SL.library`	SuperLearner library (see documentation for `SuperLearner` in the corresponding package) used to estimate the conditional average treatment effect functions.
`Delta`	Vector of the same length as `Y`. An entry should equal 1 if the corresponding entry in `Y` is observed, and should equal 0 if the corresponding entry in `Y` is to be treated as missing.
`OR.SL.library`	SuperLearner library (see documentation for `SuperLearner` in the corresponding package) used to estimate the outcome regressions.
`prop.SL.library`	SuperLearner library (see documentation for `SuperLearner` in the corresponding package) used to estimate the propensity scores.
`missingness.SL.library`	SuperLearner library (see documentation for `SuperLearner` in the corresponding package) used to estimate the probability of having a missing outcome given treatment and covariates.
`txs`	A vector indicating the two or more treatments of interest in A that will be used for the treatment assignment problem. The treatments in `A` may be a superset of those in in txs.
`baseline.probs`	A vector of the same lengths as txs indicating the (stochastic) treatment rule to use as a baseline when evaluating performance of the estimated optimal treatment rule. In this treatment rule, the `k`th treatment in `txs` is assigned with probability `baseline.probs[k]`. To obtain the marginal mean outcome under the optimal treatment strategy, i.e. not contrasting against any baseline, set `baseline.probs=NULL`.
`kappa`	maximum allowable probability of treating a randomly drawn individual in the population with the first treatment in `txs`. The default of 1 indicates no constraint. If `obsWeights` are specified, then enforces the constraint in the weighted population of interest.
`g0`	if known (as in a randomized controlled trial), a matrix of probabilities of receiving the treatment corresponding to entry `k` in `txs` given covariates in the `k`th column. Rows correspond to individuals with (`W`,`A`,`Y`) observed. If `NULL`, `SuperLearner` will be used to estimate these probabilities.
`Q0`	a user-supplied list of matrices of estimates of the mean outcome of `Y` conditional on `A` and `X`. To ensure proper cross-validation, entry `i` in this list should have been fitted without using the data included in the validation fold in `fold[[i]]`; for this reason, `fold` must be non-null if `Q0` is set to a non-null value. The matrix in entry `i` should have `n=nrow(W)` rows and `length(txs)` columns, where row `j` and column `k` contain the estimated outcome regression for the covariate level of individual `j` at treatment level `txs[k]`.
`family`	`binomial()` if outcome bounded in [0,1], or `gaussian()` otherwise.
`sig.trunc`	value at which the standard deviation estimate is truncated.
`alpha`	confidence level for returned confidence interval set to (1-alpha)*100%.
`num.folds`	number of folds to use in cross-validation step of the CV-TMLE.
`num.SL.rep`	number of super-learner repetitions (increasing this number should make the algorithm more stable across seeds).
`SL.method`	method that the SuperLearner function uses to select a convex combination of learners
`num.est.rep`	number of repetitions of estimator, minimizing variation over cross-validation fold assignment (increasing this number should make the algorithm more stable across seeds)
`id`	optional cluster identification variable. Will ensure rows with same id remain in same validation fold each time cross-validation used
`folds`	folds to be used when performing cross-validation step of the CV-TMLE. Should be in the same format as the output of `CVFolds` function from the `SuperLearner` package. If this argument is specified, then num.folds will automatically be set to 1.
`obsWeights`	observation weights
`stratifyCV`	stratify validation folds by event counts (does this for estimation of outcome regression, treatment mechanism, and conditional average treatment effect function). Useful for rare outcomes
`RR`	estimates relative risk (TRUE) or additive contrast (FALSE) between the mean outcome under optimal versus randomizing treatment via a fair coin toss. For relative risk, estimates the additive outcome of Y not occurring (since throughut we assume Y is beneficial)
`lib.ests`	Also return estimates based on candidate optimal rule estimates in the super-learner library
`init.ests.out`	Set this option to TRUE to return the initial SuperLearner estimates. Can be fed to a new call of this function using init.ests.in to speed up that call. E.g., useful if want to call this function at many values of `kappa.`
`init.ests.in`	Can be used to feed the function the initial SuperLearner estimates from a previous call of this function (see `init.ests.out`). Dramatically reduces runtime. See Example below.
`verbose`	give status updates

Details

CV-TMLE to evaluate the impact of treating the optimal subgroup versus following a user-specified static treatment strategy.

Coverage of the upper confidence bound relies on being able to estimate the optimal subgroup well in terms of mean outcome (see the cited papers).

We do not have any theoretical justication for the CV-TMLE confidence interval when the treatment effect falls on the decision boundary with positive probability (decision boundary is zero), though we have seen that it performs well in simulations.

Value

a list containing

`est`	Vector containing estimates of the impact of treating the optimal subgroup. Items in the vector correspond to different choices of algorithms for estimating the optimal treatment rule (if `lib.ests` is FALSE, only returns SuperLearner estimate).
`ci`	Matrix containing confidence intervals for the impact of treating the optimal subgroup. Left column contains lower bounds, right column contains upper bounds. Rows correspond to different choices of algorithms for estimating the optimal treatment rule (if `lib.ests` is FALSE, only returns SuperLearner estimate).
`est.mat`	Estimates across repetitions.

References

“Evaluating the Impact of Treating the Optimal Subgroup,” technical report to be released soon.

M. J. van der Laan and A. R. Luedtke, “Targeted learning of the mean outcome under an optimal dynamic treatment rule,” Journal of Causal Inference, vol. 3, no. 1, pp. 61-95, 2015.

Examples

SL.library = c('SL.mean','SL.glm')
Qbar = function(a,w){plogis((a==1)*w$W1 - (a==2)*w$W2 + (a==0))}
n = 500
W = data.frame(W1=rnorm(n),W2=rnorm(n),W3=rnorm(n),W4=rnorm(n))
A = rbinom(n,1,1/2) + rbinom(n,1,1/2)
Y = rbinom(n,1,Qbar(A,W))

# comparing the mean outcome under the optimal rule to the mean outcome
# when treating half of the population at random
sg.cvtmle(W,A,Y,baseline.probs=c(0.5,0.5),SL.library=SL.library,num.SL.rep=2,num.folds=5,family=binomial())
# same as above, but adding ids (used in CV splits) and in observation weights
sg.cvtmle(W,A,Y,SL.library=SL.library,txs=c(0,1,2),baseline.probs=c(0.5,0.5,0),num.SL.rep=2,num.folds=5,family=binomial(),id=rep(1:(n/2),2),obsWeights=1+3*runif(n))

# comparing the mean outcome under the optimal rule against the mean outcome under treating no one
# when only treatments 0 or 1 can be assigned
sg.cvtmle(W,A,Y,baseline.probs=c(1,0),txs=c(0,1),SL.library=SL.library,num.SL.rep=2,num.folds=5,family=binomial(),sig.trunc=0.001)
# comparing the mean outcome under an optimal rule that treats at most 25 percent of people
# with treatment 0 to the mean outcome under treating 25 percent of people at random
sg.cvtmle(W,A,Y,baseline.probs=c(0.25,0.375,0.375),SL.library=SL.library,txs=c(0,1,2),num.SL.rep=2,num.folds=5,kappa=0.25,family=binomial())

# estimating the mean outcomes under optimal rules that treats at most prop percent of people
# with treatment 0
out_10 = sg.cvtmle(W,A,Y,txs=c(0,1,2),baseline.probs=c(0,0,0),SL.library=SL.library,num.SL.rep=2,num.folds=5,kappa=0.10,family=binomial(),init.ests.out=TRUE)
init.ests = out_10$init.ests
for(prop in seq(0.10,0.7,by=0.2)){
  print(paste0("Can treat a ",prop," proportion of population with treatment 0."))
  out = sg.cvtmle(W,A,Y,txs=c(0,1,2),baseline.probs=c(0,0,0),SL.library=SL.library,num.SL.rep=2,num.folds=5,kappa=prop,family=binomial(),init.ests.out=FALSE,init.ests.in=init.ests,verbose=FALSE)
	 print(out$est)
}

alexluedtke12/sg documentation built on May 24, 2023, 6:36 a.m.