rfa: Estimate Average Treatment Effect via Random Forest...

View source: R/rfa.R

rfaR Documentation

Estimate Average Treatment Effect via Random Forest Adjustment

Description

This function estimates the average treatment effect of an explanatory variable on some response variable using a procedure called Random Forest Adjustment (RFA). RFA partials out the variation in a response and explanatory variable of interest as a function of a set of covariates using random forest regression. The latest version relies on the 'ranger' package, which provides a fast implementation of the random forest aglorithm (Brieman 2001).

Usage

rfa(
  formula,
  covariates,
  fes_and_res = NULL,
  data = NULL,
  se_type = "stata",
  clusters = NULL,
  ...
)

Arguments

formula

a formula object where the left-hand variable is the outcome and the right-hand variable is the explanatory variable of interest.

covariates

a formula object only containing the right-hand side specifying the covariates to be used in the random forest regressions.

fes_and_res

a formula object only containing the righ-hand side specifying any fixed effects or random effects. If random effects, you should use the notation '~ (1 | id)' as in the 'lme4' package.

data

an optional data frame containing the variables used to implement the RFA routine.

se_type

specifies the standard errors to be returned. If 'clusters' is not specified, the user can specify "classical", "HC0", "stata" (equivalent to "HC1"), "HC2", or "HC3". If 'clusters' is specified, the options are "CR0", "stata" (CR1), and "CR2". "stata" is the default.

clusters

optional name (quoted) of variable that corresponds to clusters in the data.

...

additional commands to override the default settings for implementing random forests via 'ranger'. See the 'ranger' package for more details.

Details

The package further supports fixed and random effects as well. For random effects, the 'lmer' function from the 'lme4' package is used. By specifying fixed or random effects, the response, treatment, and covariates are demeaned according to any fixed or random effects specified prior to random forest adjustment.

The RFA routine provides an estimate of the marginal relationship between some predictor (either binary or continuous) and a response variable, adjusting for the confounding influence of other variables. As its name implies, RFA substracts away the variance in the predictor of interest and response explained by confounding variables via random forest regression.

Value

'rfa' returns a list containing the model object ('lm_robust' object from the 'estimatr“ package), the data used to estimate the the model, a covariates matrix, and the random forest regressions for the response and explanatory variable ('ranger' objects from the 'ranger' package).

References

Breiman, Leo. 2001. "Random Forests." Machine Learning 45: 5-32.

The function accepts a formula object and dataframe as inputs. rfa() assumes that the first right-hand side variable in the formula object is the explanatory variable of interest and that all other variables on the right-hand side are confounding variables used to residualize the predictor and response prior to estimating the ATE. NA values are allowed.


milesdwilliams15/RFA documentation built on Sept. 26, 2023, 4:31 a.m.