sg.SL: SuperLearner for Estimating the Conditional Average Treatment...

View source: R/SL.R

sg.SLR Documentation

SuperLearner for Estimating the Conditional Average Treatment Effect

Description

This function estimates the average additive effect of assigning treatments of interest conditional on baseline covariates, compared to assigning treatment at random according to the probabilities seen in the observed population.

Usage

sg.SL(W, A, Y, SL.library, Delta = rep(1,length(A)), OR.SL.library = SL.library,
  prop.SL.library = SL.library, missingness.SL.library = SL.library, txs = c(0, 1), g0 = NULL,
  Q0 = NULL, family = binomial(), num.SL.folds = 10, num.SL.rep = 5,
  SL.method = "method.NNLS2", id = NULL, obsWeights = NULL,
  stratifyCV = FALSE, lib.ests = FALSE, ...)

Arguments

W

data frame with observations in the rows and baseline covariates used to form the subgroup in columns.

A

numeric treatment vector. Treatments of interest specified using the txs argument.

Y

real-valued outcome.

SL.library

SuperLearner library (see documentation for SuperLearner in the corresponding package) used to estimate the conditional average treatment effect functions.

Delta

Vector of the same length as Y. An entry should equal 1 if the corresponding entry in Y is observed, and should equal 0 if the corresponding entry in Y is to be treated as missing.

OR.SL.library

SuperLearner library (see documentation for SuperLearner in the corresponding package) used to estimate the outcome regressions.

prop.SL.library

SuperLearner library (see documentation for SuperLearner in the corresponding package) used to estimate the propensity scores.

missingness.SL.library

SuperLearner library (see documentation for SuperLearner in the corresponding package) used to estimate the probability of having a missing outcome given treatment and covariates.

txs

A vector indicating the two or more treatments of interest in A that will be used for the treatment assignment problem. The treatments in A may be a superset of those in in txs.

g0

if known (as in a randomized controlled trial), a matrix of probabilities of receiving the treatment corresponding to entry k in txs given covariates in the kth column. Rows correspond to individuals with (W,A,Y) observed. If NULL, SuperLearner will be used to estimate these probabilities.

Q0

a user-supplied matrix of estimates of the mean outcome of Y conditional on A and X. The matrix should have n=nrow(W) rows and length(txs) columns, where row j and column k contain the estimated outcome regression for the covariate level of individual j at treatment level txs[k].

family

binomial() if outcome bounded in [0,1], or gaussian() otherwise. See Details.

num.SL.folds

number of folds to use in SuperLearner.

num.SL.rep

final output is an average of num.SL.rep super-learner fits (repetition ensures minimal reliance on initial choice of folds)

SL.method

method that the SuperLearner function uses to select a convex combination of learners

id

optional cluster identification variable

obsWeights

observation weights

stratifyCV

stratify validation folds by event counts (does this for estimation of outcome regression, treatment mechanism, and conditional average treatment effect function). Useful for rare outcomes

lib.ests

Also return the candidate optimal rule estimates in the super-learner library

Details

If outcome is bounded in [0,1], then this functions respects that fact when estimating the outcome regression but not when estimating the conditional average treatment effect using the double robust loss presented in the below cited paper.

Value

a list containing

est

Vector containing an estimate of the conditional average treatment effect function for each individual in the data set (conditional on the covariate strata they belong to). Here the conditional average treatment effect is defined as the difference in conditional mean outcome if receiving the treatment in txs versus the expected outcome for a treatment randomly drawn according to the observed distribution (conditional on covariates).

SL.cate.fun

A function that takes as input covariates (as a matrix) and returns a matrix of conditional average treatment effects (estimated by SuperLearner) with rows corresponding to the different covariate values in the rows of W and columns corresponding to the different treatments.

SL

a list of lists of SuperLearner objects used to generate these estimates. Each entry in the outer list corresponds to a treatment in txs. Each entry in the inner list corresponds to one of the num.SL.rep repetitions.

if lib.ests is set to true, then this list also contains:

lib.ests

a list with entries corresponding to learners in SL.library. Each entry is of the same format as est.

lib.cate.fun

A function that takes as input covariates and returns a list with entries corresponding to learner in SL.library. Each entry is of the same format as the output of SL.cate.fun.

References

A. R. Luedtke and M. J. van der Laan, “Super-learning of an optimal dynamic treatment rule,” International Journal of Biostatistics (to appear), 2014.

Examples

# SuperLearner library
SL.library = c('SL.mean','SL.glm')

# simulated data
Qbar = function(a,w){plogis(a*w$W1)}
n = 1000
W = data.frame(W1=rnorm(n),W2=rnorm(n),W3=rnorm(n),W4=rnorm(n))
A = rbinom(n,1,1/2)
Y = rbinom(n,1,Qbar(A,W))
txs = c(0,1)

# sg.SL fit
out = sg.SL(W,A,Y,SL.library=SL.library,family=binomial())

# CATE estimate
cate.est = out$est
plot(W$W1,cate.est[,2])

# can also call predict to get predictions
predict(out,data.frame(W1=0,W2=0,W3=0,W4=0))

# compare to the truth
EYw = 0.5*Qbar(0,W)+0.5*Qbar(1,W)
cate.truth1 = Qbar(1,W) - EYw
plot(cate.est[,2],cate.truth1)

alexluedtke12/sg documentation built on May 24, 2023, 6:36 a.m.