# bartc: Causal Inference using Bayesian Additive Regression Trees In bartCause: Causal Inference using Bayesian Additive Regression Trees

## Description

Fits a collection of treatment and response models using the Bayesian Additive Regresssion Trees (BART) algorithm, producing estimates of treatment effects.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12``` ```bartc(response, treatment, confounders, data, subset, weights, method.rsp = c("bart", "tmle", "p.weight"), method.trt = c("bart", "glm", "none"), estimand = c("ate", "att", "atc"), group.by = NULL, commonSup.rule = c("none", "sd", "chisq"), commonSup.cut = c(NA_real_, 1, 0.05), args.rsp = list(), args.trt = list(), p.scoreAsCovariate = TRUE, use.ranef = TRUE, group.effects = FALSE, crossvalidate = FALSE, keepCall = TRUE, verbose = TRUE, ...) ```

## Arguments

 `response` A vector of the outcome variable, or a reference to such in the `data` argument. Can be continuous or binary. `treatment` A vector of the binary treatment variable, or a reference to `data`. `confounders` A matrix or data frame of covariates to be used in estimating the treatment and response model. Can also be the right-hand-side of a formula (e.g. `x1 + x2 + ...`). The `data` argument will be searched if supplied. `data` An optional data frame or named list containing the `response`, `treatment`, and `confounders`. `subset` An optional vector using to subset the data. Can refer to `data` if provided. `weights` An optional vector of population weights used in model fitting and estimating the treatment effect. Can refer to `data` if provided. `method.rsp` A character string specifying which method to use when fitting the response surface and estimating the treatment effect. Options are: `"bart"` - fit the response surface with BART and take the average of the individual treatment effect estimates, `"p.weight"` - fit the resposne surface with BART but compute the treatment effect estimate by using a propensity score weighted sum of individual effects, and `"tmle"` - as above, but further adjust the individual estimates using the Targetted Minimum Loss based Estimation (TMLE) adjustment. `method.trt` A character string specifying which method to use when fitting the treatment assingment mechanism, or a vector/matrix of propensity scores. Character string options are: `"bart"` - fit BART directly to the treatment variable, `"glm"` - fit a generalized linear model with a binomial response and all confounders added linearly, and `"none"` - do no propensity score estimation. Cannot be `"none"` if the response model requires propensity scores. When supplied as a matrix, it should be of dimensions equal to the number of observations times the number of samples used in any response model. `estimand` A character string specifying which causal effect to target. Options are `"ate"` - average treatment effect, `"att"` - average treatment effect on the treated, and `"atc"` - average treatment effect on the controls. `group.by` An optional factor that, when present, causes the treatment effect estimate to be calculated within each group. `commonSup.rule` Rule for exclusion of observations lacking in common support. Options are `"none"` - no suppression, `"sd"` - exlude units whose predicted counterfactual standard deviation is extreme compared to the maximum standard deviation under those units' observed treatment condition, where extreme refers to the distribution of all standard deviations of observed treatment conditions, `"chisq"` - exclude observations according to ratio of the variance of posterior predicted counterfactual to the posterior variance of the observed condition, having a Chi Squared distribution with one degree of freedom under the null hypothesis of have equal distributions. `commonSup.cut` Cuttoffs for `commonSup.rule`. Ignored for `"none"`, when `commonSup.rule` is `"sd"`, refers to how many standard deviations of the distribution of posterior variance for counterfactuals an observation can be above the maximum of posterior variances for that treatment condition. When `commonSup.rule` is `"chisq"`, is the p value used for rejection of the hypothesis of equal variances. `p.scoreAsCovariate` A logical such that when `TRUE`, the propensity score is added to the response model as a covariate. `use.ranef` Logical specifying if `group.by` variable - when present - should be included as a "random" or "fixed" effect. If true, `rbart` will be used for BART models. Using random effects for treatment assignment mechanisms of type `"glm"` require that the `lme4` package be available. `group.effects` Logical specifying if effects should be calculated within groups if the `group.by` variable is provided. Response methods of `"tmle"` and `"p.weight"` are such that if group effects are calculated, then the population effect is not provided. `keepCall` A logical such that when `FALSE`, the call to `bartc` is not kept. This can reduce the amount of information printed by `summary` when passing in data as literals. `crossvalidate` One of `TRUE`, `FALSE`, `"trt"`, or `"rsp"`. Enables code to attempt to estimate the optimal end-node sensitivity parameter. This uses a rudimentary Bayesian optimization routine and can be extremely slow. `verbose` A logical that when `TRUE` prints information as the model is fit. `args.rsp,args.trt,...` Further arguments to the treatment and response model fitting algorithms. Arguments passed to the main function as ... will be used in both models. `args.rsp` and `args.trt` can be used to set parameters in a single fit, and will override other values. See `glm` and `bart2` for reference.

## Details

`bartc` represents a collection of methods that primarily use the Bayesian Additive Regression Trees (BART) algorithm to estimate causal treatment effects with binary treatment variables and continuous outcomes. This requires models to be fit to the response surface (distribution of the response as a function of treatment and confounders, p(Y(1), Y(0) | X) and optionally for treatment assignment mechanism (probability of receiving treatment, i.e. propensity score, Pr(Z = 1 | X)). The response surface model is used to impute counterfactuals, which may then be adjusted together with the propensity score to produce estimates of effects.

Similar to `lm`, models can be specified symbolically. When the `data` term is present, it will be added to the search path for the `response`, `treatment`, and `confounders` variables. The confounders must be specified devoid of any "left hand side", as they appear in both of the models.

Response Surface

The response surface methods included are:

• `"bart"` - use BART to fit the response surface and produce individual estimates Y(1)^hat_i and Y(0)^hat_i. Treatment effect estimates are obtained by averaging the difference of these across the population of interest.

• `"p.weight"` - individual effects are esimated as in `"bart"`, but treatment effect estimates are obtained by using a propensity score weighted average. For the average treatment effect on the treated, these weights are p(z_i | x_i) / (∑ z / n). For ATC, replace z with 1 - z. For ATE, `"p.weight"` is equal to `"bart"`.

• `"tmle"` - individual effects are esimated as in `"bart"` and a weighted average is taken as in `"p.weight"`, however the response surface estimates and propensity scores are corrected by using the Targetted Minimum Loss based Estimation method.

Treatment Assignment

The treatment assignment models are:

• `"bart"` - fit a binary BART directly to the treatment using all the confounders.

• `"none"` - no modeling is doing. Only applies when using response method `"bart"` and `p.scoreAsCovariate` is `FALSE`.

• `"glm"` - fit a binomial generalized linear model with logistic link and confounders included as linear terms.

• Finally, a vector or matrix of propensity scores can be supplied. Propensity score matrices should have a number of rows equal to the number of observations in the data and a number of columns equal to the number of posterior samples.

Generics

For a fitted model, the easiest way to analyze the resulting fit is to use the generics `fitted`, `extract`, and `predict` to analyze specific quantities and `summary` to aggregate those values into targets (e.g. ATE, ATT, or ATC).

Common Support Rules

Common support, or that the probability of receiving all treatment conditions is non-zero within every area of the covariate space (P(Z = 1 | X = x) > 0 for all x in the inferential sample), can be enforced by excluding observations with high posterior uncertainty. `bartc` supports two common support rules through `commonSup.rule` argument:

• `"sd"` - observations are cut from the inferential sample if: s_i^f(1-z) > m_z + a * sd(s_j^f(z)), where s_i^f(1-z) is the posterior standard deviation of the predicted counterfactual for observation i, s_j^f(z) is the posterior standard deviation of the prediction for the observed treatment condition of objservation j, sd(s_j^f(z)) is the empirical standard deviation of those quantities, and m_z = max_j s_j^f(z) for all j in the same treatment group, i.e. Z_j = z. a is a constant to be passed in using `commonSup.cut` and its default is 1.

• `"chisq"` - observations are cut from the inferential sample if: s_i^f(1-z) / s_i^f(z))^2 > q_α, where s_i are as above and q_α, is the upper α percentile of a χ^2 distribution with one degree of freedom, corresponding to a null hypothesis of equal variance. The default for α is 0.05, and it is specified using the `commonSup.cut` parameter.

Special Arguments

Some default arguments are unconvential or are passed in a unique fashion.

• If `n.chains` is missing, unlike in `bart2` a default of 10 is used.

• For `method.rsp == "tmle"`, a special `arg.trt` of `posteriorOfTMLE` determines if the TMLE correction should be applied to each posterior sample (`TRUE`), or just the posterior mean (`FALSE`).

Missing Data

Missingness is allowed only in the response. If some response values are `NA`, the BART models will be trained just for where data are available and those values will be used to make predictions for the missing observations. Missing observations are not used when calculating statistics for assessing common support, although they may still be excluded on those grounds. Further, missing observations may not be compatible with response method `"tmle"`.

## Value

`bartc` returns an object of class `bartcFit`. Information about the object can be derived by using methods `summary`, `plot_sigma`, `plot_est`, `plot_indiv`, and `plot_support`. Numerical quantities are recovered with the `fitted` and `extract` generics. Predictions for new observations are obtained with `predict`.

Objects of class `bartcFit` are lists containing items:

 `method.rsp` character string specifying the method used to fit the response surface `method.trt` character string specifying the method used to fit the treatment assignment mechanism `estimand` character string specifying the targetted causal effect `fit.rsp` object containing the fitted response model `data.rsp` `dbartsData` object used when fitting the response model `fit.trt` object containing the fitted treatment model `group.by` optional factor vector containing the groups in which treatment effects are estimated `est` matrix or array of posterior samples of the treatment effect estimate `p.score` the vector of propensity scores used as a covariate in the response model, when applicable `samples.p.score` matrix or array of posterior samples of the propensity score, when applicable `mu.hat.obs` samples from the posterior of the expected value for individual responses under the observed treatment regime `mu.hat.cf` samples from the posterior of the expected value for individual responses under the counterfactual treatment `name.trt` character string giving the name of the treatment variable in the data of `fit.rsp` `trt` vector of treatment assignments `call` how `bartc` was called `n.chains` number of independent posterior sampler chains in response model `commonSup.rule` common support rule used for suppressing observations `commonSup.cut` common support parameter used to set cutoff when suppression observations `sd.obs` vector of standard deviations of individual posterior predictors for observed treatment conditions `sd.cf` vector of standard deviations of individual posterior predictors for counterfactuals `commonSup.sub` logical vector expressing which observations are used when estimating treatment effects `use.ranef` logical for whether ranef models were used; only added when true `group.effects` logical for whether group-level estimates are made; only added when true `seed` a random seed for use when drawing from the posterior predictive distribution

## Author(s)

Vincent Dorie: vdorie@gmail.com.

## References

Chipman, H., George, E. and McCulloch R. (2010) BART: Bayesian additive regression trees. The Annals of Applied Statistics 4(1), 266–298. The Institute of Mathematical Statistics. https://doi.org/10.1214/09-AOAS285.

Hill, J. L. (2011) Bayesian Nonparametric Modeling for Causal Inference. Journal of Computational and Graphical Statistics 20(1), 217–240. Taylor & Francis. https://doi.org/10.1198/jcgs.2010.08162.

Hill, J. L. and Su Y. S. (2013) Assessing Lack of Common Support in Causal Inference Using Bayesian Nonparametrics: Implications for Evaluating the Effect of Breastfeeding on Children's Cognitive Outcomes The Annals of Applied Statistics 7(3), 1386–1420. The Institute of Mathematical Statistics. https://doi.org/10.1214/13-AOAS630.

Carnegie, N. B. (2019) Comment: Contributions of Model Features to BART Causal Inference Performance Using ACIC 2016 Competition Data Statistical Science 34(1), 90–93. The Institute of Mathematical Statistics. https://doi.org/10.1214/18-STS682

`bart2`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26``` ```## fit a simple linear model n <- 100L beta.z <- c(.75, -0.5, 0.25) beta.y <- c(.5, 1.0, -1.5) sigma <- 2 set.seed(725) x <- matrix(rnorm(3 * n), n, 3) tau <- rgamma(1L, 0.25 * 16 * rgamma(1L, 1 * 32, 32), 16) p.score <- pnorm(x %*% beta.z) z <- rbinom(n, 1, p.score) mu.0 <- x %*% beta.y mu.1 <- x %*% beta.y + tau y <- mu.0 * (1 - z) + mu.1 * z + rnorm(n, 0, sigma) # low parameters only for example fit <- bartc(y, z, x, n.samples = 100L, n.burn = 15L, n.chains = 2L) summary(fit) ## example to show refitting under the common support rule fit2 <- refit(fit, commonSup.rule = "sd") fit3 <- bartc(y, z, x, subset = fit2\$commonSup.sub, n.samples = 100L, n.burn = 15L, n.chains = 2L) ```