knitr::opts_chunk$set( collapse = FALSE, warning = FALSE, message = FALSE, tidy = FALSE, fig.align='center', comment = "#>", fig.path = "man/figures/README-", R.options = list(width = 200) )
lmw
computes weights implied by a linear regression model used to estimate an average treatment effect and provides diagnostics that incorporate these weights as described in Chattopadhyay & Zubizarreta (2023). The treatment effect resulting from this model can be represented as a difference between the weighted outcome means in the treatment groups, similar to inverse probability weighting.
The weights implied by the regression model have several interesting characteristics: they yield exact mean balance between the treatment groups for every covariate included in the model, they have minimum variance among all weights that do so, and they may be negative. Despite yielding exact mean balance between treatment groups, they may not yield balance between the treatment groups and the target population corresponding to the desired estimand; in addition, the negative weights they produce indicate extrapolation beyond the support of the covariates. lmw
provides tools to compute the implied weights and perform diagnostics to assess balance, extrapolation, influence, and distributional properties of the weights. In addition, lmw
provides tools to estimate average treatment effects from the specified models that correspond to selected target populations.
Below is an example of the use of lmw
to obtain and evaluate regression weights for estimating the average treatment effect on the treated (ATT) of a job training program on earnings using the Lalonde dataset (see help("lalonde", package = "lmw")
for details). Here, the treatment variable is treat
, the outcome is re78
(1978 earnings) and age
, education
, race
, married
, nodegree
, re74
, and re75
are pretreatment covariates.
#Load lmw and the data library("lmw") data("lalonde") #Estimate the weights lmw.out <- lmw(~ treat + age + education + race + married + nodegree + re74 + re75, data = lalonde, treat = "treat", estimand = "ATT", method = "URI")
lmw.out
contains the implied regression weights. We can see that the weighted outcome mean difference between the treatment groups is equal to the coefficient on the treatment in the corresponding outcome regression model:
#The weighted difference in means w <- lmw.out$weights with(lalonde, weighted.mean(re78[treat == 1], w[treat == 1]) - weighted.mean(re78[treat == 0], w[treat == 0])) #The coefficient in the corresponding outcome model fit <- lm(re78 ~ treat + age + education + race + married + nodegree + re74 + re75, data = lalonde) coef(fit)["treat"]
How well do the weights do at balancing the treatment groups to the target population (in the case, the treated sample)?
(s <- summary(lmw.out))
The SMD
column contains the standardized mean differences for each covariate between the treatment groups; after weighting, the values are all equal to zero because of the properties of the implied regression weights. However, the difference between each treatment group and target sample remain, as displayed in the TSMD Treated
and TSMD Control
columns, which contain the standardized mean differences between each treatment group and the target sample (which in this case is the treated group because the ATT was requested).
We can summarize this balance table in a plot:
plot(s)
We can examine the distribution of weights to see to what degree negative weights are present:
plot(lmw.out, type = "weights")
A red line indicates the average of the weights in that group (the weights are scaled so this is always equal to 1). Negative weights are present in the control group. Weights are also highly variable in the control group, leading to a decreased effective sample size (ESS).
We can further examine extrapolation for specific covariates:
plot(lmw.out, type = "extrapolation", variables = ~married + re75)
The $\times$ indicates the mean of the covariate in the target population (the treated group), and the vertical line indicates the mean of the covariate weighted by the implied regression weights. For these covariates, the implied regression weights yield a sample fairly representative of the target population.
We can examine how influential individual points are using their sample influence curves, which are a function of their residuals, leverages, and implied weights:
plot(lmw.out, type = "influence", outcome = re78)
Finally, we can fit the outcome model and extract the average treatment effect estimates:
lmw.fit <- lmw_est(lmw.out, outcome = re78) summary(lmw.fit)
In addition, lmw
can be used with multi-category treatments, two-stage least squares estimation of instrumental variable models, and doubly-robust estimators by interfacing with the MatchIt
and WeightIt
packages to implement these diagnostics for regression in a matched or weighted sample.
Chattopadhyay, A., & Zubizarreta, J. R. (2023). On the implied weights of linear regression for causal inference. Biometrika, 110(3), 615–629. https://doi.org/10.1093/biomet/asac058
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.