options(width = 100) options(rmarkdown.html_vignette.check_title = FALSE) knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(mediationClarity) library(ggplot2)
The package mediationClarity
implements the methods covered in the manuscript @nguyen2022MediationEstimation titled Causal mediation analysis: from simple to more robust estimation strategies for marginal natural (in)direct effects. (arxiv:2102.06048).
The key documentation of the package is the paper itself, which contains all the technical details of the methods.
This vignette, which presents the functions that implement the methods, is the computing appendix to the paper.
We will briefly describe the illustrative data, which will help make things concrete, in Section 1.3.
Before getting to the estimators (what you are most likely here for), we will cover all the relevant weighting business in Section 2, as weighting is involved in multiple estimators. Sections 3-5 then track the three groups of estimators presented in the paper. For each weighting method or estimator, we present the function that implements the method and explain its outputs.
In sections 3-5, to speed up the compling of this document we put the number of bootstrap samples in each estimator function call at 99, which is clearly too small. The default in the package is 999.
Section 6 plots the estimates, and reproduces Figure 8 of the manuscript.
In @nguyen2022MediationEstimation we acknowledge the sources on which we built the technical content of this work. Here we acknowledge the most important resources that helped the construction of this R package: the R packages devtools
[@devtools], usethis
[@usethis], roxygen2
[@roxygen2], the super helpful R Packages book [@wickhamPackages], R-bloggers posts such as @pang2013NewTrickMe, many helpful posts on Stack Overflow, and our Stuart lab colleagues who kindly share their programming tips Nick Seewald, Leon Di Stefano, Yongqi Zhong and Elena Badillo Goicoechea.
We use a synthetic dataset synth
that was created based on the PAS trial [@koning2011WhyTargetEarly].
Detailed description is provided in the paper.
Let's have a quick look at the data. The variables include:
treat
(combined intervention vs. usual care)sex
, age
, edu
(binary variable for academic or vocational education track), religion
, att0
(binary, attitude against alcohol use), rul0
(binary, strict parental rules about alcohol use), sfc0
(self-control when it comes to situations with alcohol, on a 0-to-4 scale), drink0
(weekly drinking: yes, no, no response)att
, rul
, sfc
(attitude, rules and self-control at six months)drink
(weekly drinking at 22 months, binary)head(synth)
table1::table1(~ age + sex + edu + religion + drink0 + att0 + rul0 + sfc0 | factor(treat), data = synth, overall = "all")
\ \
Mediation weighting is weighting subsamples to create pseudo treated (herein p11
) and pseudo control (p00
) samples that mimic the C distribution of the full sample, and to create pseudo cross-world sample(s) (p10
and/or p01
) that mimics the full sample C distribution and the M given C distribution of the controls (if p10
) or of the treated (if p01
).
Following the paper, we show all computation just for the (NDE0, NIE1) decomposition, which requires p10
but not p01
.
Mediation weighting is used for the pure weighting estimator (wtd) and other estimators including Y2predR, psYpredMR, YpredMR, MsimYpredMR, wt-Cadj and wpMR-Cadj.
The paper mentions three formulas for the cross-world weights. We use the second formula, based on fitting models for $\mathrm{P}(A|C)$ and $\mathrm{P}(A|C,M)$.
w.med <- weights_med( data = synth, s.wt.var = NULL, # name sampling weight variable here (if needed) cross.world = "10", # default # P(A|C) and P(A|C,M) model formulas a.c.form = "treat ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 4)", a.cm.form = "treat ~ sex + age + edu + religion + drink0 + att0 * att + rul0 * rul + splines::ns(sfc0, 4) * splines::ns(sfc, 4)", plot = TRUE, # default c.std = "sfc0", # C and M vars for which want m.std = "sfc", # standardized mean diffs in balance plot )
The input arguments are available for use as needed and ignored if not, and are reused in the different functions in the package as appropriate.
s.wt.var
is available for all functions, but we mention it only once here, as synth
does not have sampling weights.
cross.world
defaults to "10" if not specified. The other options are "01" (for the other decomposition), or "both".
We ask for standardized mean differences for sfc0
and sfc
, which are continuous variables.
There are two other plot parameters c.order
and m.order
specifying the order by which C and M variables are to be plotted (if different from the order they appear in a.c.form
and a.cm.form
), which we don't need yet.
The outputs include weighted data and plots.
names(w.med)
The weighted dataset w.dat
has several new variables, whose names all start with a dot to differentiate them from the original data. Three of these variables are for weights: .s.wt
(sampling weights), .w.wt
(distribution-morphing weights) and .f.wt
(final weights, the product of sampling and distribution-morphing weights). In this example, we do not have sampling weights, so .s.wt
= 1 and .f.wt
= .w.wt
.
names(w.med$w.dat) table(w.med$w.dat$.samp) summary(w.med$w.dat[, c(".s.wt", ".w.wt", ".f.wt")])
There are two types of plots, weight-distribution plots and balance plots.
names(w.med$plots) w.med$plots$w.wt.distribution
Note that the figure above is of weights not in raw but in stabilized form, ie the weights have been normed to have mean 1 within each pseudo sample. Stabilizing puts the weights on the same scale (a meaningful one!) so they are comparable across pseudo samples.
If the original data had sampling weights, there would be two of these plots, one for .w.wt
, one for .f.wt
.
w.med$plots$balance
This is a plot of balance on means of variables. It shows C-balance between p11
, p00
, p10
and the full sample, and CM-balance between p10
and p00
.
For sfc0
and sfc
, standardized mean differences are shown; this is marked by the star on these variables in the y-axis.
For these continuous variables, we would want to also look at distributional balance. We do not implement distributional balance in the current package. Distributional balance could be implemented using packages that focus on balance visualization, eg the cobalt
package [@cobalt].
The two other relevant weighting schemes that are used for some of the estimators are: C-based inverse probability weighting (for the psYpred estimator and the wp-Cadj estimators) and CM-based odds weighting (for the Ypred estimator).
# C-based inverse probability weighting w.c.ipw <- weights_ipw( data = synth, a.form = "treat ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 4)", plot = TRUE, vars.std = "sfc0" )
# CM-based odds weighting w.cm.odds <- weights_odds( data = synth, cross.world = "10", a.form = "treat ~ sex + age + edu + religion + drink0 + att0 * att + rul0 * rul + splines::ns(sfc0, 4) * splines::ns(sfc, 4)", plot = TRUE, # default vars.std = c("sfc0", "sfc"), vars.order = c("sex", "age", "edu", "religion", "drink0", "rul0", "att0", "sfc0", "rul", "att", "sfc") )
We keep the syntax of weights_ipw()
and weights_odds()
general (without reference to C or M) so that they can be used as general purpose functions. In mediation analysis, weights_ipw()
is typically used based on C variables, and weights_odds()
based on (C,M) variables, so a.form
is implicitly a.c.form
in weights_ipw()
and a.cm.form
in weights_odds()
, and vars.
is implicitly c.
in weights_ipw()
and cm.
in weights_odds()
.
In the second code chunk here we use the plotting order argument, because our model formula mixes up the order of the covariates and mediators.
The output structure is the same as that used for the weights_med()
function. We do not show the plot details here, but will come back to them when getting to the estimators that use these weighting schemes.
names(w.c.ipw) names(w.c.ipw$plots)
names(w.cm.odds) names(w.cm.odds$plots)
\
Here we present the estimators in a different order from that in the paper. After the pure weighting estimator, we will cover first the Y2pred pair where the more robust estimator does not rely on any specific model being correct (it is multiply robust) before the psYpred, Ypred and MsimYpred pairs.
est.wtd <- estimate_wtd( data = synth, cross.world = "10", a.c.form = "treat ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 4)", a.cm.form = "treat ~ sex + age + edu + religion + drink0 + att0 * att + rul0 * rul + splines::ns(sfc0, 4) * splines::ns(sfc, 4)", y.var = "drink", # outcome variable plot = TRUE, # default c.std = "sfc0", m.std = "sfc", boot.num = 99, # default value is 999 boot.method = "cont-wt", # default boot.seed = 77777 )
The arguments are similar to the arguments of weights_med()
, with two additions:
- the outcome variable name (via y.var
)
- bootstrap inputs: Options for boot.method
are "resample" (resampling bootstrap) and "cont-wt" (continuous weights bootstrap) [@xu2020ContinuousWeightsBootstrap]. If boot.seed
is not specified, the function randomly draws a seed.
names(est.wtd) names(est.wtd$plots)
The outputs below are quite self-explanatory.
round(est.wtd$estimates, 4)
est.wtd$boot.seed
est.wtd$plots$key.balance
This balance plot is the same plot as that from weights_med()
, with two subtle differences:
key.balance
, signaling that all balance components in this plot are important to the estimator.pdf(here::here("vignettes", "results", "fig7.pdf"), width = 4, height = 3.5) est.wtd$plots$w.wt.distribution dev.off() pdf(here::here("vignettes", "results", "fig8.pdf"), width = 7.5, height = 6) est.wtd$plots$key.balance dev.off()
\
This pair includes two estimators Y2pred and Y2pred.R.
The nonrobust Y2pred is consistent if all modeling components are correctly specified, including the models for $\mathrm{E}[Y\mid C,A=1]$, $\mathrm{E}[Y\mid C,A=0]$, $\mathrm{E}[Y\mid C,M,A=1]$, and $\mathrm{E}[Y_{1M_0}\mid C]$.
The robust Y2pred.R is consistent under the following conditions
# the nonrobust Y2pred estimator est.Y2pred <- estimate_Y2pred( data = synth, cross.world = "10", a.var = "treat", # formulas for E[Y|C,A=0], E[Y|C,A=1], E[Y|C,M,A=1], E[Y(1,M0)|C] y.c0.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)", y.c1.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)", y.cm1.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3) + att + rul + splines::ns(sfc, 3)", y10.c.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)", y.link = "logit", boot.num = 99, # default value is 999 boot.method = "cont-wt", # default boot.seed = 77777 )
# the robust Y2predR estimator est.Y2predR <- estimate_Y2predR( data = synth, cross.world = "10", # formulas for P(A|C), P(A|C,M) a.c.form = "treat ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 4)", a.cm.form = "treat ~ sex + age + edu + religion + drink0 + att0 * att + rul0 * rul + splines::ns(sfc0, 4) * splines::ns(sfc, 4)", plot = TRUE, c.std = "sfc0", m.std = "sfc", # formulas for E[Y|C,A=0], E[Y|C,A=1], E[Y|C,M,A=1], E[Y(1,M0)|C] y.c0.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)", y.c1.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)", y.cm1.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3) + att + rul + splines::ns(sfc, 3)", y10.c.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)", y.link = "logit", boot.num = 99, # default value is 999 boot.method = "cont-wt", boot.seed = 77777 )
y.c0.form
and y.c1.form
are model formulas for $\mathrm{E}[Y|C,A=0]$ and $\mathrm{E}[Y|C,A=1]$.
y.cm1.form
and y10.c.form
are model formulas for $\mathrm{E}[Y|C,M,A=1]$ and $\mathrm{E}[Y_{1M_0}|C]$, which are required for cross.world
= "10".
For cross.world
= "01", use y.cm0.form
and y01.c.form
, for $\mathrm{E}[Y|C,M,A=0]$ and $\mathrm{E}[Y_{0M_1}|C]$.
There are times when we might want to use the same model form for several models (like what we do in the code above). Then can use shortcut arguments y.c.form
and y.cm.form
instead:
y.c.form
gets populated to y.c0.form
, y.c1.form
, y10.c.form
, y01.c.form
if any of these inputs are required but not specifiedy.cm.form
gets populated to y.cm0.form
, y.cm1.form
if either of these inputs is required but not specifiedThe nonrobust Y2pred
This estimator simply outputs the estimates and the bootstrap seed.
names(est.Y2pred)
round(est.Y2pred$estimates, 4)
The robust Y2predR
names(est.Y2predR) names(est.Y2predR$plots)
This estimator outputs estimates, bootstrap seed, and also the weight distribution and balance plot from the mediation weighting. These plots are exactly the same (and have the same name) as provided by the weights_med()
function, so they are not shown here. The estimates are:
round(est.Y2predR$estimates, 4)
\
The paper mentions two versions, here we implement the second version.
This pair includes two esitmators psYpred and psYpred.MR.
The nonrobust psYpred is consistent if all modeling components are correctly specified, including $\omega_0(C)$, $\mathrm{E}[Y\mid C,M,A=1]$, and $\mathrm{E}[Y\mid C,A=1]$.
The more robust psYpred.MR is consistent under the following conditions:
# the nonrobust psYpred estimator est.psYpred <- estimate_psYpred( data = synth, cross.world = "10", # default a.c.form = "treat ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 4)", plot = TRUE, # default c.std = "sfc0", y.c1.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)", y.cm1.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3) + att + rul + splines::ns(sfc, 3)", y.link = "logit", boot.num = 99, # default value is 999 boot.method = "cont-wt", # default boot.seed = 77777)
# the more robust psYpredMR estimator est.psYpredMR <- estimate_psYpredMR( data = synth, cross.world = "10", # default a.c.form = "treat ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 4)", a.cm.form = "treat ~ sex + age + edu + religion + drink0 + att0 * att + rul0 * rul + splines::ns(sfc0, 4) * splines::ns(sfc, 4)", plot = TRUE, # default c.std = "sfc0", m.std = "sfc", y.c1.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)", y.cm1.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3) + att + rul + splines::ns(sfc, 3)", y.link = "logit", boot.num = 99, # default value is 999 boot.method = "cont-wt", # default boot.seed = 77777)
The nonrobust psYpred
names(est.psYpred) names(est.psYpred$plots)
round(est.psYpred$estimates, 4)
gridExtra::grid.arrange( est.psYpred$plots$w.wt.distribution + ggtitle("w.wt distribution"), est.psYpred$plots$key.balance + ggtitle("key balance"), layout_matrix = rbind(c(1, 2, 2)))
\
The more robust psYpredMR
names(est.psYpredMR) names(est.psYpredMR$plots)
round(est.psYpredMR$estimates, 4)
gridExtra::grid.arrange( est.psYpredMR$plots$w.wt.distribution + ggtitle("w.wt distribution"), est.psYpredMR$plots$key.balance + ggtitle("key balance"), est.psYpredMR$plots$full.balance + ggtitle("full balance"), layout_matrix = rbind(c(1, 1, 2, 2, 2), c(1, 1, 2, 2, 2), c(3, 3, 3, 3, 3), c(3, 3, 3, 3, 3), c(3, 3, 3, 3, 3)))
\
This pair includes two estimators Ypred and Ypred.MR.
The nonrobust Ypred is consistent if all modeling components are correctly specified, including $\mathrm{E}[Y\mid C,A=1]$, $\mathrm{E}[Y\mid C,A=0]$, $o_\text{x}(C,M)$, and $\mathrm{E}[Y_{1M_0}\mid C]$.
The more robust Ypred.MR is consistent under the following conditions:
# the nonrobust Ypred estimator est.Ypred <- estimate_Ypred( data = synth, cross.world = "10", # default a.cm.form = "treat ~ sex + age + edu + religion + drink0 + att0 * att + rul0 * rul + splines::ns(sfc0, 4) * splines::ns(sfc, 4)", plot = TRUE, # default cm.std = c("sfc0", "sfc"), y.c0.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)", y.c1.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)", y10.c.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)", y.link = "logit", boot.num = 99, # default value is 999 boot.method = "cont-wt", # default boot.seed = 77777 )
# the more robust YpredMR estimator est.YpredMR <- estimate_YpredMR( data = synth, cross.world = "10", # default a.c.form = "treat ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 4)", a.cm.form = "treat ~ sex + age + edu + religion + drink0 + att0 * att + rul0 * rul + splines::ns(sfc0, 4) * splines::ns(sfc, 4)", plot = TRUE, # default c.std = "sfc0", m.std = "sfc", y.c0.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)", y.c1.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)", y10.c.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)", y.link = "logit", boot.num = 99, # default value is 999 boot.method = "cont-wt", # default boot.seed = 77777 )
Here if the same model form is used for the different outcome given covariates models, the shortcut argument y.c.form
can be used instead of y.c0.form
, y.c1.form
, y10.c.form
.
The nonrobust Ypred
names(est.Ypred) names(est.Ypred$plots)
round(est.Ypred$estimates, 4)
gridExtra::grid.arrange( est.Ypred$plots$w.wt.distribution + ggtitle("w.wt distribution"), est.Ypred$plots$key.balance + ggtitle("key balance"), layout_matrix = rbind(c(1, 2, 2)))
\
The more robust YpredMR
names(est.YpredMR) names(est.YpredMR$plots)
round(est.YpredMR$estimates, 4)
gridExtra::grid.arrange( est.YpredMR$plots$w.wt.distribution + ggtitle("w.wt distribution"), est.YpredMR$plots$key.balance + ggtitle("key balance"), est.YpredMR$plots$full.balance + ggtitle("full balance"), layout_matrix = rbind(c(1, 1, 2, 2, 2), c(1, 1, 2, 2, 2), c(3, 3, 3, 3, 3), c(3, 3, 3, 3, 3), c(3, 3, 3, 3, 3)))
\
\
The paper mentioned two versions. Here we implement the first version.
This pair includes the two estimators MsimYpred and MsimYpred.MR.
The nonrobust MsimYpred is consistent if all modeling components are consistent, including $\mathrm{P}(M\mid C,A=0)$, $\mathrm{E}[Y\mid C,M,A=1]$, $\mathrm{E}[Y\mid C,A=1]$, and $\mathrm{E}[Y\mid C,A=0]$.
The more robust MsimYpred.MR is consistent under the following conditions:
# the nonrobust MsimYpred estimator est.MsimYpred <- estimate_MsimYpred( data = synth, cross.world = "10", a.var = 'treat', y.c0.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)", y.c1.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)", y.link = "logit", m.c0.form = list("att ~ age + sex + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0,3)", "rul ~ age + sex + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0,3) + att", "sfc ~ age + sex + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0,3) + att + rul"), m.dist = list("binary", "binary", "normal"), y.cm1.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3) + att + rul + splines::ns(sfc, 3)", boot.num = 99, # default value is 999 boot.method = "cont-wt", # default boot.seed = 77777, point.reps = 500 )
# the more robust MsimYpredMR estimator est.MsimYpredMR <- estimate_MsimYpredMR( data = synth, cross.world = "10", a.c.form = "treat ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 4)", a.cm.form = "treat ~ sex + age + edu + religion + drink0 + att0 * att + rul0 * rul + splines::ns(sfc0, 4) * splines::ns(sfc, 4)", y.c0.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)", y.c1.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)", y.link = "logit", m.c0.form = list("att ~ age + sex + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0,3)", "rul ~ age + sex + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0,3) + att", "sfc ~ age + sex + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0,3) + att + rul"), m.dist = list("binary", "binary", "normal"), y.cm1.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3) + att + rul + splines::ns(sfc, 3)", boot.num = 99, # default value is 999 boot.method = "cont-wt", # default boot.seed = 77777, point.reps = 500 )
est.MsimYpred
names(est.MsimYpredMR) names(est.MsimYpredMR$plots)
est.MsimYpredMR$estimates
\
This pair includes two estimators NDEpred and NDEpred.R.
The nonrobust NDEpred is consistent if all modeling components are correctly specified, including the models for $\mathrm{E}[Y\mid C,A=1]$, $\mathrm{E}[Y\mid C,A=0]$, $\mathrm{E}[Y\mid C,M,A=1]$, and $\mathrm{E}[{NDE}_0\mid C]$.
The robust NDEpred.R is consistent under the following conditions
# the nonrobust NDEpred estimator est.NDEpred <- estimate_NDEpred( data = synth, cross.world = "10", # default a.var = "treat", y.c1.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)", y.c0.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)", y.cm1.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3) + att + rul + splines::ns(sfc, 3)", nde0.c.form = "effect ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)", y.link = "logit", boot.num = 99, # default value is 999 boot.method = "cont-wt", # default boot.seed = 77777 )
# the robust NDEpredR estimator est.NDEpredR <- estimate_NDEpredR( data = synth, cross.world = "10", # default a.c.form = "treat ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 4)", a.cm.form = "treat ~ sex + age + edu + religion + drink0 + att0 * att + rul0 * rul + splines::ns(sfc0, 4) * splines::ns(sfc, 4)", plot = TRUE, # default c.std = "sfc0", m.std = "sfc", y.c1.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)", y.c0.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)", y.cm1.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3) + att + rul + splines::ns(sfc, 3)", nde0.c.form = "effect ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)", y.link = "logit", boot.num = 99, # default value is 999 boot.method = "cont-wt", # default boot.seed = 77777 )
The nonrobust NDEpred
names(est.NDEpred)
round(est.NDEpred$estimates, 4)
The robust NDEpredR
names(est.NDEpredR) names(est.NDEpredR$plots)
round(est.NDEpredR$estimates, 4)
\
For this estimation strategy, the paper mentions two versions. We implement the version where the two effects are estimated using the same model.
est.wtCadj <- estimate_wtCadj( data = synth, cross.world = "10", a.c.form = "treat ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 4)", a.cm.form = "treat ~ sex + age + edu + religion + drink0 + att0 * att + rul0 * rul + splines::ns(sfc0, 4) * splines::ns(sfc, 4)", plot = TRUE, # default c.std = "sfc0", m.std = "sfc", y.var = "drink", y.link = "logit", boot.num = 99, boot.method = "cont-wt", boot.seed = 77777 )
names(est.wtCadj) names(est.wtCadj$plots)
est.wtCadj$estimates
est.wtCadj$plots$w.wt.distribution
est.wtCadj$plots$key.balance
These weight distribution and key balance plots are exactly the same as what we have for the pure weighting estimator.
\
The paper mentions two versions. We implement the version where the two effects are estimated separately.
This pair includes two estimators wp-Cadj and wp.MR-Cadj.
The nonrobust wp-Cadj is consistent if all modeling components except the working regression model are correctly specified, including $\omega_0(C)$, $\omega_1(C)$, and $\mathrm{E}[Y\mid C,M,A=1]$.
The (slightly) more robust wp.MR-Cadj is consistent if $\omega_0(C)$ and $\omega_1(C)$ are correct, and either $\omega_\text{x}(C,M)$ or $\mathrm{E}[Y\mid C,M,A=1]$ is correct.
# the nonrobust wpCadj estimator est.wpCadj <- estimate_wpCadj( data = synth, cross.world = "10", a.c.form = "treat ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 4)", plot = TRUE, c.std = "sfc0", y.cm1.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3) + att + rul + splines::ns(sfc, 3)", y.cm0.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3) + att + rul + splines::ns(sfc, 3)", y.link = "logit", boot.num = 99, boot.method = "cont-wt", boot.seed = 77777 )
# the more robust wpMRCadj estimator est.wpMRCadj <- estimate_wpMRCadj( data = synth, cross.world = "10", a.c.form = "treat ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 4)", a.cm.form = "treat ~ sex + age + edu + religion + drink0 + att0 * att + rul0 * rul + splines::ns(sfc0, 4) * splines::ns(sfc, 4)", plot = TRUE, c.std = "sfc0", m.std = "sfc", y.cm1.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3) + att + rul + splines::ns(sfc, 3)", y.cm0.form = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3) + att + rul + splines::ns(sfc, 3)", y.link = "logit", boot.num = 99, boot.method = "cont-wt", boot.seed = 77777 )
names(est.wpCadj) names(est.wpCadj$plots)
est.wpCadj$estimates
est.wpCadj$plots$w.wt.distribution
est.wpCadj$plots$key.balance
names(est.wpMRCadj) names(est.wpMRCadj$plots)
est.wpMRCadj$estimates
gridExtra::grid.arrange( est.wpMRCadj$plots$w.wt.distribution + ggtitle("w.wt distribution"), est.wpMRCadj$plots$key.balance + ggtitle("key balance"), est.wpMRCadj$plots$full.balance + ggtitle("full balance"), layout_matrix = rbind(c(1, 1, 1, NA, NA), c(1, 1, 1, NA, NA), c(2, 2, 2, 2, 2), c(2, 2, 2, 2, 2), c(3, 3, 3, 3, 3), c(3, 3, 3, 3, 3), c(3, 3, 3, 3, 3)))
\
Here we want to show all the estimates from all the estimators. Let's collect all the results from the different estimators into a list.
est.names <- c("wtd", "Y2pred", "Y2predR", "psYpred", "psYpredMR", "Ypred", "YpredMR", "MsimYpred", "MsimYpredMR", "NDEpred", "NDEpredR", "wtCadj", "wpCadj", "wpMRCadj") estimates <- mget(paste0("est.", est.names)) names(estimates) <- est.names
# This is a boot.num workaround. # to obtain useful confidence intervals, # change boot.num to 999 and rerun all estimation chunks plus the above chunk, # then run this to save the results. # attn: you don't need to do those, because this has previously been done, # so the file here::here("vignettes", "results", "estimates.rds") has already been saved. saveRDS(estimates, here::here("vignettes", "results", "estimates.rds"))
# grab the results with the confidence intervals based on boot.num = 999 estimates <- readRDS(here::here("vignettes", "results", "estimates.rds"))
We process the list into a data frame with relevant information.
estimates <- lapply(est.names, function(z) { tmp <- estimates[[z]]$estimates tmp <- tmp[c("NDE0", "NIE1"), c("estimate", "2.5%", "97.5%")] colnames(tmp) <- c("estimate", "lb", "ub") est <- as.data.frame(tmp, row.names = FALSE) est$effect <- rownames(tmp) est$estimator <- z est }) estimates <- do.call(rbind, estimates) estimates$estimator <- factor(estimates$estimator, levels = est.names) estimates$properties <- ifelse(estimates$estimator %in% c("wtd", "Y2pred", "psYpred", "Ypred", "MsimYpred", "NDEpred", "wtCadj", "wpCadj"), "non-robust", ifelse(estimates$estimator %in% c("psYpredMR", "YpredMR", "MsimYpredMR", "wpMRCadj"), "more robust", "robust")) estimates$properties = factor(estimates$properties, levels = c("non-robust", "more robust", "robust")) head(estimates)
Now we are ready to plot.
ggplot(data = estimates, aes(x = estimator, # flip the sign of y, as we will present effects as reductions y = -estimate, ymin = -ub, ymax = -lb, color = properties, shape = properties)) + geom_hline(yintercept = 0, color = "gray") + geom_linerange() + geom_point(aes(size = properties)) + scale_color_manual(values = c("gray50", "blue", "purple")) + scale_shape_manual(values = c(17, 18, 19)) + scale_size_manual(values = c(3, 4.5, 3.7)) + scale_x_discrete(limits = rev, position = "top") + coord_flip() + labs(x = "", y = "estimate and 95% confidence interval") + facet_wrap(~ effect) + theme_bw() + theme(legend.position = "left")
This plot shows, for all the estimators, the $\text{NDE}_0$ and $\text{NIE}_1$ point estimates and with 95\% confidence intervals. These effects are in terms of reductions in weekly drinking prevalence at 22 months.
But wait! Should we trust these confidence intervals with boot.num = 99
?
Well, actually, the results here are based on boot.num = 999
. (This is the plot in Figure 9 in the manuscript.) We had swapped out the toy results from previous sections for the serious results here using some hidden code right before making the plot. The code is shown in the .Rmd file used to compile this vignette.
pdf(here::here("vignettes", "results", "fig9.pdf"), width = 7, height = 4.5) ggplot(data = estimates, aes(x = estimator, # flip the sign of y, as we will present effects as reductions y = -estimate, ymin = -ub, ymax = -lb, color = properties, shape = properties)) + geom_hline(yintercept = 0, color = "gray") + geom_linerange() + geom_point(aes(size = properties)) + scale_color_manual(values = c("gray50", "blue", "purple")) + scale_shape_manual(values = c(17, 18, 19)) + scale_size_manual(values = c(3, 4.5, 3.7)) + scale_x_discrete(limits = rev, position = "top") + coord_flip() + labs(x = "", y = "estimate and 95% confidence interval") + facet_wrap(~ effect) + theme_bw() + theme(legend.position = "left") dev.off()
\
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.