do_crossfit: Estimation of regression functions using cross-fitting

Description Usage Arguments Value Details References See Also Examples

View source: R/do_crossfit.R

Description

do_crossfit estimates the nuisance regression functions using the SuperLearner and via cross-fitting. Cross-fitting allows the user to avoid imposing empirical process conditions on these functions, while still attaining, when possible, fast rates of convergence.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
do_crossfit(
  y,
  a,
  x,
  ymin,
  ymax,
  nsplits,
  sl.lib,
  outfam = gaussian(),
  treatfam = binomial(),
  show_progress = FALSE,
  do_parallel = FALSE,
  ncluster = NULL
)

Arguments

y

nx1 outcome vector in [0, 1]

a

nx1 treatment received vector

x

nxp data.frame of covariates. Variable must be named.

ymin

scalar such that P(Y >= ymin) = 1.

ymax

scalar such that P(Y <= ymax) = 1.

nsplits

number of splits for the cross-fitting.

sl.lib

character vector specifying which libraries to use for the SL.

outfam

family specifying the error distribution for outcome regression, currently gaussian() or binomial() supported. Link should not be specified. Default is gaussian().

treatfam

family specifying the error distribution for treatment regression, currently binomial() supported. Link should not be specified.

show_progress

boolean for whether progress bar should be shown. Default is FALSE. Currently, only available if do_parallel is FALSE.

do_parallel

boolean for whether parallel computing should be used. Default is FALSE.

ncluster

number of clusters used if parallel computing is used.

Value

A list containing

test

a nx4 matrix containing estimates of E(Y|A = 0, X), E(Y|A = 1, X), P(A = 0|X), and P(A = a|X) evaluated at the test points X. If the function is estimated using folds 1 and 2, the values return are the predictions corresponding to fold 3 (assuming nsplits = 3 in this case).

train

a (n*(nsplits-1))x4 matrix containing estimates of E(Y|A = 0, X), E(Y|A = 1, X), P(A = 0|X), and P(A = a|X) evaluated at the train points X. If the function is estimated using folds 1 and 2, the values return are the predictions corresponding to folds 1 and 2.

order_obs

a n-dimensional vector specifying the order of the observations after doing cross-fitting, where the order is given by fold num. For instance, if unit 1 is in fold 3, unit 2 is in fold 1, unit 3 is in fold 1 and unit 4 is in fold 2, order_obs = c(2, 3, 4, 1).

folds

a n-dimensional vectors specifying which fold each unit falls into. For instance, if unit 1 is in fold 3, unit 2 is in fold 1, unit 3 is in fold 1 and unit 4 is in fold 2, folds = c(3, 1, 1, 2).

Details

If the SuperLearner returns an error, a GLM is fitted instead. In this case, we suggest the user chooses some other method of estimation and then pass the estimates as arguments to other functions.

References

Van der Laan, M. J., Polley, E. C., & Hubbard, A. E. (2007). Super learner. Statistical applications in genetics and molecular biology, 6(1).

Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., & Newey, W. K. (2016). Double machine learning for treatment and causal parameters (No. CWP49/16). cemmap working paper.

See Also

get_muahat and get_piahat.

Examples

1
2
3
4
5
6
7
8
n <- 500
x <- data.frame(x1 = rnorm(n), x2 = runif(n))
a <- rbinom(n, 1, pnorm(x$x1))
y <- 2 + x$x1 - x$x2 + rnorm(n)
fits <- do_crossfit(y, a, x, min(y), max(y), outfam = gaussian(), 
                    treatfam = binomial(), nsplits = 5, 
                    sl.lib = c("SL.mean", "SL.glm", "SL.gam"))
head(fits$test)

matteobonvini/sensitivitypuc documentation built on Dec. 9, 2020, 2:24 a.m.