sg.cvtmle: CV-TMLE Estimating Impact of Treating Optimal Subgroup

View source: R/cvtmle.R

sg.cvtmleR Documentation

CV-TMLE Estimating Impact of Treating Optimal Subgroup

Description

This function uses a cross-validated targeted minimum loss-based estimator (CV-TMLE) to evaluate the impact of treating the optimal subgroup versus following a user-specified static treatment strategy.

Usage

sg.cvtmle(W, A, Y, SL.library, Delta = rep(1,length(A)), OR.SL.library = SL.library,
  prop.SL.library = SL.library, missingness.SL.library = SL.library, txs = c(0, 1),
  baseline.probs = c(0.5, 0.5), kappa = 1, g0 = NULL, Q0 = NULL,
  family = binomial(), sig.trunc = 1e-10, alpha = 0.05,
  num.folds = 10, num.SL.rep = 5, SL.method = "method.NNLS2",
  num.est.rep = 5, id = NULL, folds = NULL, obsWeights = NULL,
  stratifyCV = FALSE, RR = FALSE, lib.ests = FALSE,
  init.ests.out = FALSE, init.ests.in = NULL, verbose = TRUE, ...)

Arguments

W

data frame with observations in the rows and baseline covariates used to form the subgroup in columns.

A

numeric treatment vector. Treatments of interest specified using the txs argument.

Y

real-valued outcome for which large values are preferred (if use relative risk contrast, then Y should be an indicator of the absence of an adverse event, and the relative risk returned is the relative risk of the adverse event).

SL.library

SuperLearner library (see documentation for SuperLearner in the corresponding package) used to estimate the conditional average treatment effect functions.

Delta

Vector of the same length as Y. An entry should equal 1 if the corresponding entry in Y is observed, and should equal 0 if the corresponding entry in Y is to be treated as missing.

OR.SL.library

SuperLearner library (see documentation for SuperLearner in the corresponding package) used to estimate the outcome regressions.

prop.SL.library

SuperLearner library (see documentation for SuperLearner in the corresponding package) used to estimate the propensity scores.

missingness.SL.library

SuperLearner library (see documentation for SuperLearner in the corresponding package) used to estimate the probability of having a missing outcome given treatment and covariates.

txs

A vector indicating the two or more treatments of interest in A that will be used for the treatment assignment problem. The treatments in A may be a superset of those in in txs.

baseline.probs

A vector of the same lengths as txs indicating the (stochastic) treatment rule to use as a baseline when evaluating performance of the estimated optimal treatment rule. In this treatment rule, the kth treatment in txs is assigned with probability baseline.probs[k]. To obtain the marginal mean outcome under the optimal treatment strategy, i.e. not contrasting against any baseline, set baseline.probs=NULL.

kappa

maximum allowable probability of treating a randomly drawn individual in the population with the first treatment in txs. The default of 1 indicates no constraint. If obsWeights are specified, then enforces the constraint in the weighted population of interest.

g0

if known (as in a randomized controlled trial), a matrix of probabilities of receiving the treatment corresponding to entry k in txs given covariates in the kth column. Rows correspond to individuals with (W,A,Y) observed. If NULL, SuperLearner will be used to estimate these probabilities.

Q0

a user-supplied list of matrices of estimates of the mean outcome of Y conditional on A and X. To ensure proper cross-validation, entry i in this list should have been fitted without using the data included in the validation fold in fold[[i]]; for this reason, fold must be non-null if Q0 is set to a non-null value. The matrix in entry i should have n=nrow(W) rows and length(txs) columns, where row j and column k contain the estimated outcome regression for the covariate level of individual j at treatment level txs[k].

family

binomial() if outcome bounded in [0,1], or gaussian() otherwise.

sig.trunc

value at which the standard deviation estimate is truncated.

alpha

confidence level for returned confidence interval set to (1-alpha)*100%.

num.folds

number of folds to use in cross-validation step of the CV-TMLE.

num.SL.rep

number of super-learner repetitions (increasing this number should make the algorithm more stable across seeds).

SL.method

method that the SuperLearner function uses to select a convex combination of learners

num.est.rep

number of repetitions of estimator, minimizing variation over cross-validation fold assignment (increasing this number should make the algorithm more stable across seeds)

id

optional cluster identification variable. Will ensure rows with same id remain in same validation fold each time cross-validation used

folds

folds to be used when performing cross-validation step of the CV-TMLE. Should be in the same format as the output of CVFolds function from the SuperLearner package. If this argument is specified, then num.folds will automatically be set to 1.

obsWeights

observation weights

stratifyCV

stratify validation folds by event counts (does this for estimation of outcome regression, treatment mechanism, and conditional average treatment effect function). Useful for rare outcomes

RR

estimates relative risk (TRUE) or additive contrast (FALSE) between the mean outcome under optimal versus randomizing treatment via a fair coin toss. For relative risk, estimates the additive outcome of Y not occurring (since throughut we assume Y is beneficial)

lib.ests

Also return estimates based on candidate optimal rule estimates in the super-learner library

init.ests.out

Set this option to TRUE to return the initial SuperLearner estimates. Can be fed to a new call of this function using init.ests.in to speed up that call. E.g., useful if want to call this function at many values of kappa.

init.ests.in

Can be used to feed the function the initial SuperLearner estimates from a previous call of this function (see init.ests.out). Dramatically reduces runtime. See Example below.

verbose

give status updates

Details

CV-TMLE to evaluate the impact of treating the optimal subgroup versus following a user-specified static treatment strategy.

Coverage of the upper confidence bound relies on being able to estimate the optimal subgroup well in terms of mean outcome (see the cited papers).

We do not have any theoretical justication for the CV-TMLE confidence interval when the treatment effect falls on the decision boundary with positive probability (decision boundary is zero), though we have seen that it performs well in simulations.

Value

a list containing

est

Vector containing estimates of the impact of treating the optimal subgroup. Items in the vector correspond to different choices of algorithms for estimating the optimal treatment rule (if lib.ests is FALSE, only returns SuperLearner estimate).

ci

Matrix containing confidence intervals for the impact of treating the optimal subgroup. Left column contains lower bounds, right column contains upper bounds. Rows correspond to different choices of algorithms for estimating the optimal treatment rule (if lib.ests is FALSE, only returns SuperLearner estimate).

est.mat

Estimates across repetitions.

References

“Evaluating the Impact of Treating the Optimal Subgroup,” technical report to be released soon.

M. J. van der Laan and A. R. Luedtke, “Targeted learning of the mean outcome under an optimal dynamic treatment rule,” Journal of Causal Inference, vol. 3, no. 1, pp. 61-95, 2015.

Examples

SL.library = c('SL.mean','SL.glm')
Qbar = function(a,w){plogis((a==1)*w$W1 - (a==2)*w$W2 + (a==0))}
n = 500
W = data.frame(W1=rnorm(n),W2=rnorm(n),W3=rnorm(n),W4=rnorm(n))
A = rbinom(n,1,1/2) + rbinom(n,1,1/2)
Y = rbinom(n,1,Qbar(A,W))

# comparing the mean outcome under the optimal rule to the mean outcome
# when treating half of the population at random
sg.cvtmle(W,A,Y,baseline.probs=c(0.5,0.5),SL.library=SL.library,num.SL.rep=2,num.folds=5,family=binomial())
# same as above, but adding ids (used in CV splits) and in observation weights
sg.cvtmle(W,A,Y,SL.library=SL.library,txs=c(0,1,2),baseline.probs=c(0.5,0.5,0),num.SL.rep=2,num.folds=5,family=binomial(),id=rep(1:(n/2),2),obsWeights=1+3*runif(n))

# comparing the mean outcome under the optimal rule against the mean outcome under treating no one
# when only treatments 0 or 1 can be assigned
sg.cvtmle(W,A,Y,baseline.probs=c(1,0),txs=c(0,1),SL.library=SL.library,num.SL.rep=2,num.folds=5,family=binomial(),sig.trunc=0.001)
# comparing the mean outcome under an optimal rule that treats at most 25 percent of people
# with treatment 0 to the mean outcome under treating 25 percent of people at random
sg.cvtmle(W,A,Y,baseline.probs=c(0.25,0.375,0.375),SL.library=SL.library,txs=c(0,1,2),num.SL.rep=2,num.folds=5,kappa=0.25,family=binomial())

# estimating the mean outcomes under optimal rules that treats at most prop percent of people
# with treatment 0
out_10 = sg.cvtmle(W,A,Y,txs=c(0,1,2),baseline.probs=c(0,0,0),SL.library=SL.library,num.SL.rep=2,num.folds=5,kappa=0.10,family=binomial(),init.ests.out=TRUE)
init.ests = out_10$init.ests
for(prop in seq(0.10,0.7,by=0.2)){
  print(paste0("Can treat a ",prop," proportion of population with treatment 0."))
  out = sg.cvtmle(W,A,Y,txs=c(0,1,2),baseline.probs=c(0,0,0),SL.library=SL.library,num.SL.rep=2,num.folds=5,kappa=prop,family=binomial(),init.ests.out=FALSE,init.ests.in=init.ests,verbose=FALSE)
	 print(out$est)
}

alexluedtke12/sg documentation built on May 24, 2023, 6:36 a.m.