options(width = 100)
options(rmarkdown.html_vignette.check_title = FALSE)
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(mediationClarity)
library(ggplot2)

First things first

The package mediationClarity implements the methods covered in the manuscript @nguyen2022MediationEstimation titled Causal mediation analysis: from simple to more robust estimation strategies for marginal natural (in)direct effects. (arxiv:2102.06048).

The key documentation of the package is the paper itself, which contains all the technical details of the methods.

This vignette, which presents the functions that implement the methods, is the computing appendix to the paper.

Vignette structure

We will briefly describe the illustrative data, which will help make things concrete, in Section 1.3.

Before getting to the estimators (what you are most likely here for), we will cover all the relevant weighting business in Section 2, as weighting is involved in multiple estimators. Sections 3-5 then track the three groups of estimators presented in the paper. For each weighting method or estimator, we present the function that implements the method and explain its outputs.

In sections 3-5, to speed up the compling of this document we put the number of bootstrap samples in each estimator function call at 99, which is clearly too small. The default in the package is 999.

Section 6 plots the estimates, and reproduces Figure 8 of the manuscript.

Acknowledgements

In @nguyen2022MediationEstimation we acknowledge the sources on which we built the technical content of this work. Here we acknowledge the most important resources that helped the construction of this R package: the R packages devtools [@devtools], usethis [@usethis], roxygen2 [@roxygen2], the super helpful R Packages book [@wickhamPackages], R-bloggers posts such as @pang2013NewTrickMe, many helpful posts on Stack Overflow, and our Stuart lab colleagues who kindly share their programming tips Nick Seewald, Leon Di Stefano, Yongqi Zhong and Elena Badillo Goicoechea.

The data example

We use a synthetic dataset synth that was created based on the PAS trial [@koning2011WhyTargetEarly]. Detailed description is provided in the paper.

Let's have a quick look at the data. The variables include:

head(synth)
table1::table1(~ age + sex + edu + religion + drink0 + att0 + rul0 + sfc0 | factor(treat), 
               data = synth, overall = "all")

\ \

All the weighting

Mediation weighting

Mediation weighting is weighting subsamples to create pseudo treated (herein p11) and pseudo control (p00) samples that mimic the C distribution of the full sample, and to create pseudo cross-world sample(s) (p10 and/or p01) that mimics the full sample C distribution and the M given C distribution of the controls (if p10) or of the treated (if p01).

Following the paper, we show all computation just for the (NDE0, NIE1) decomposition, which requires p10 but not p01.

Mediation weighting is used for the pure weighting estimator (wtd) and other estimators including Y2predR, psYpredMR, YpredMR, MsimYpredMR, wt-Cadj and wpMR-Cadj.

The paper mentions three formulas for the cross-world weights. We use the second formula, based on fitting models for $\mathrm{P}(A|C)$ and $\mathrm{P}(A|C,M)$.

Function

w.med <- weights_med(
    data        = synth,
    s.wt.var    = NULL,   # name sampling weight variable here (if needed)

    cross.world = "10",   # default

    # P(A|C) and P(A|C,M) model formulas
    a.c.form    = "treat ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 4)",
    a.cm.form   = "treat ~ sex + age + edu + religion + drink0 + att0 * att + rul0 * rul + splines::ns(sfc0, 4) * splines::ns(sfc, 4)",

    plot        = TRUE,   # default
    c.std  = "sfc0", # C and M vars for which want 
    m.std  = "sfc",  # standardized mean diffs in balance plot
)

The input arguments are available for use as needed and ignored if not, and are reused in the different functions in the package as appropriate.

s.wt.var is available for all functions, but we mention it only once here, as synth does not have sampling weights.

cross.world defaults to "10" if not specified. The other options are "01" (for the other decomposition), or "both".

We ask for standardized mean differences for sfc0 and sfc, which are continuous variables.

There are two other plot parameters c.order and m.order specifying the order by which C and M variables are to be plotted (if different from the order they appear in a.c.form and a.cm.form), which we don't need yet.

Outputs

The outputs include weighted data and plots.

names(w.med)

The weighted dataset w.dat has several new variables, whose names all start with a dot to differentiate them from the original data. Three of these variables are for weights: .s.wt (sampling weights), .w.wt (distribution-morphing weights) and .f.wt (final weights, the product of sampling and distribution-morphing weights). In this example, we do not have sampling weights, so .s.wt = 1 and .f.wt = .w.wt.

names(w.med$w.dat)
table(w.med$w.dat$.samp)
summary(w.med$w.dat[, c(".s.wt", ".w.wt", ".f.wt")])

There are two types of plots, weight-distribution plots and balance plots.

names(w.med$plots)
w.med$plots$w.wt.distribution

Note that the figure above is of weights not in raw but in stabilized form, ie the weights have been normed to have mean 1 within each pseudo sample. Stabilizing puts the weights on the same scale (a meaningful one!) so they are comparable across pseudo samples.

If the original data had sampling weights, there would be two of these plots, one for .w.wt, one for .f.wt.

w.med$plots$balance

This is a plot of balance on means of variables. It shows C-balance between p11, p00, p10 and the full sample, and CM-balance between p10 and p00.

For sfc0 and sfc, standardized mean differences are shown; this is marked by the star on these variables in the y-axis.

For these continuous variables, we would want to also look at distributional balance. We do not implement distributional balance in the current package. Distributional balance could be implemented using packages that focus on balance visualization, eg the cobalt package [@cobalt].

Other relevant weighting

The two other relevant weighting schemes that are used for some of the estimators are: C-based inverse probability weighting (for the psYpred estimator and the wp-Cadj estimators) and CM-based odds weighting (for the Ypred estimator).

Functions

# C-based inverse probability weighting
w.c.ipw <- weights_ipw(
    data     = synth,

    a.form   = "treat ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 4)",

    plot     = TRUE,
    vars.std = "sfc0"
)
# CM-based odds weighting
w.cm.odds <- weights_odds(
    data        = synth,

    cross.world = "10",

    a.form      = "treat ~ sex + age + edu + religion + drink0 + att0 * att + rul0 * rul + splines::ns(sfc0, 4) * splines::ns(sfc, 4)",

    plot        = TRUE,  # default
    vars.std    = c("sfc0", "sfc"),
    vars.order  = c("sex", "age", "edu", "religion", "drink0", "rul0", "att0", "sfc0", "rul", "att", "sfc")
)

We keep the syntax of weights_ipw() and weights_odds() general (without reference to C or M) so that they can be used as general purpose functions. In mediation analysis, weights_ipw() is typically used based on C variables, and weights_odds() based on (C,M) variables, so a.form is implicitly a.c.form in weights_ipw() and a.cm.form in weights_odds(), and vars. is implicitly c. in weights_ipw() and cm. in weights_odds().

In the second code chunk here we use the plotting order argument, because our model formula mixes up the order of the covariates and mediators.

Outputs

The output structure is the same as that used for the weights_med() function. We do not show the plot details here, but will come back to them when getting to the estimators that use these weighting schemes.

names(w.c.ipw)
names(w.c.ipw$plots)
names(w.cm.odds)
names(w.cm.odds$plots)

\

Estimators -- based on potential outcome means

Here we present the estimators in a different order from that in the paper. After the pure weighting estimator, we will cover first the Y2pred pair where the more robust estimator does not rely on any specific model being correct (it is multiply robust) before the psYpred, Ypred and MsimYpred pairs.

Pure weighting

Function

est.wtd <- estimate_wtd(
    data        = synth,
    cross.world = "10",

    a.c.form    = "treat ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 4)",
    a.cm.form   = "treat ~ sex + age + edu + religion + drink0 + att0 * att + rul0 * rul + splines::ns(sfc0, 4) * splines::ns(sfc, 4)",

    y.var       = "drink", # outcome variable

    plot        = TRUE,   # default
    c.std       = "sfc0",
    m.std       = "sfc",

    boot.num    = 99,        # default value is 999
    boot.method = "cont-wt", # default
    boot.seed   = 77777
)

The arguments are similar to the arguments of weights_med(), with two additions: - the outcome variable name (via y.var) - bootstrap inputs: Options for boot.method are "resample" (resampling bootstrap) and "cont-wt" (continuous weights bootstrap) [@xu2020ContinuousWeightsBootstrap]. If boot.seed is not specified, the function randomly draws a seed.

Outputs

names(est.wtd)
names(est.wtd$plots)

The outputs below are quite self-explanatory.

round(est.wtd$estimates, 4)
est.wtd$boot.seed
est.wtd$plots$key.balance

This balance plot is the same plot as that from weights_med(), with two subtle differences:

pdf(here::here("vignettes", "results", "fig7.pdf"), width = 4, height = 3.5)
est.wtd$plots$w.wt.distribution
dev.off()

pdf(here::here("vignettes", "results", "fig8.pdf"), width = 7.5, height = 6)
est.wtd$plots$key.balance
dev.off()

\

Y2pred pair

This pair includes two estimators Y2pred and Y2pred.R.

The nonrobust Y2pred is consistent if all modeling components are correctly specified, including the models for $\mathrm{E}[Y\mid C,A=1]$, $\mathrm{E}[Y\mid C,A=0]$, $\mathrm{E}[Y\mid C,M,A=1]$, and $\mathrm{E}[Y_{1M_0}\mid C]$.

The robust Y2pred.R is consistent under the following conditions

Functions

# the nonrobust Y2pred estimator
est.Y2pred <- estimate_Y2pred(
    data        = synth,
    cross.world = "10",

    a.var       = "treat",

    # formulas for E[Y|C,A=0], E[Y|C,A=1], E[Y|C,M,A=1], E[Y(1,M0)|C]
    y.c0.form   = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)",
    y.c1.form   = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)",
    y.cm1.form  = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3) + att + rul + splines::ns(sfc, 3)",
    y10.c.form  = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)",
    y.link      = "logit",

    boot.num    = 99,         # default value is 999
    boot.method = "cont-wt",  # default
    boot.seed   = 77777
)
# the robust Y2predR estimator
est.Y2predR <- estimate_Y2predR(
    data        = synth,
    cross.world = "10",

    # formulas for P(A|C), P(A|C,M)
    a.c.form    = "treat ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 4)",
    a.cm.form   = "treat ~ sex + age + edu + religion + drink0 + att0 * att + rul0 * rul + splines::ns(sfc0, 4) * splines::ns(sfc, 4)",

    plot        = TRUE,
    c.std       = "sfc0",
    m.std       = "sfc",

    # formulas for E[Y|C,A=0], E[Y|C,A=1], E[Y|C,M,A=1], E[Y(1,M0)|C]
    y.c0.form   = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)",
    y.c1.form   = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)",
    y.cm1.form  = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3) + att + rul + splines::ns(sfc, 3)",
    y10.c.form  = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)",
    y.link      = "logit",

    boot.num    = 99,         # default value is 999
    boot.method = "cont-wt",
    boot.seed   = 77777
)

y.c0.form and y.c1.form are model formulas for $\mathrm{E}[Y|C,A=0]$ and $\mathrm{E}[Y|C,A=1]$.

y.cm1.form and y10.c.form are model formulas for $\mathrm{E}[Y|C,M,A=1]$ and $\mathrm{E}[Y_{1M_0}|C]$, which are required for cross.world = "10".

For cross.world = "01", use y.cm0.form and y01.c.form, for $\mathrm{E}[Y|C,M,A=0]$ and $\mathrm{E}[Y_{0M_1}|C]$.

There are times when we might want to use the same model form for several models (like what we do in the code above). Then can use shortcut arguments y.c.form and y.cm.form instead:

Outputs

The nonrobust Y2pred

This estimator simply outputs the estimates and the bootstrap seed.

names(est.Y2pred)
round(est.Y2pred$estimates, 4)

The robust Y2predR

names(est.Y2predR)
names(est.Y2predR$plots)

This estimator outputs estimates, bootstrap seed, and also the weight distribution and balance plot from the mediation weighting. These plots are exactly the same (and have the same name) as provided by the weights_med() function, so they are not shown here. The estimates are:

round(est.Y2predR$estimates, 4)

\

psYpred pair

The paper mentions two versions, here we implement the second version.

This pair includes two esitmators psYpred and psYpred.MR.

The nonrobust psYpred is consistent if all modeling components are correctly specified, including $\omega_0(C)$, $\mathrm{E}[Y\mid C,M,A=1]$, and $\mathrm{E}[Y\mid C,A=1]$.

The more robust psYpred.MR is consistent under the following conditions:

Functions

# the nonrobust psYpred estimator
est.psYpred <- estimate_psYpred(
    data        = synth,
    cross.world = "10",  # default

    a.c.form    = "treat ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 4)",

    plot        = TRUE,  # default
    c.std  = "sfc0",

    y.c1.form   = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)",
    y.cm1.form  = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3) + att + rul + splines::ns(sfc, 3)",
    y.link      = "logit",

    boot.num    = 99,         # default value is 999
    boot.method = "cont-wt",  # default
    boot.seed   = 77777)
# the more robust psYpredMR estimator
est.psYpredMR <- estimate_psYpredMR(
    data        = synth,
    cross.world = "10",  # default

    a.c.form    = "treat ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 4)",
    a.cm.form   = "treat ~ sex + age + edu + religion + drink0 + att0 * att + rul0 * rul + splines::ns(sfc0, 4) * splines::ns(sfc, 4)",

    plot        = TRUE,  # default
    c.std  = "sfc0",
    m.std  = "sfc",

    y.c1.form   = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)",
    y.cm1.form  = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3) + att + rul + splines::ns(sfc, 3)",
    y.link = "logit",

    boot.num    = 99,         # default value is 999
    boot.method = "cont-wt",  # default
    boot.seed   = 77777)

Outputs

The nonrobust psYpred

names(est.psYpred)
names(est.psYpred$plots)
round(est.psYpred$estimates, 4)
gridExtra::grid.arrange(
    est.psYpred$plots$w.wt.distribution + ggtitle("w.wt distribution"),
    est.psYpred$plots$key.balance       + ggtitle("key balance"),
    layout_matrix = rbind(c(1, 2, 2)))

\

The more robust psYpredMR

names(est.psYpredMR)
names(est.psYpredMR$plots)
round(est.psYpredMR$estimates, 4)
gridExtra::grid.arrange(
    est.psYpredMR$plots$w.wt.distribution + ggtitle("w.wt distribution"),
    est.psYpredMR$plots$key.balance       + ggtitle("key balance"),
    est.psYpredMR$plots$full.balance      + ggtitle("full balance"),
    layout_matrix = rbind(c(1, 1, 2, 2, 2),
                          c(1, 1, 2, 2, 2),
                          c(3, 3, 3, 3, 3),
                          c(3, 3, 3, 3, 3),
                          c(3, 3, 3, 3, 3)))

\

Ypred pair

This pair includes two estimators Ypred and Ypred.MR.

The nonrobust Ypred is consistent if all modeling components are correctly specified, including $\mathrm{E}[Y\mid C,A=1]$, $\mathrm{E}[Y\mid C,A=0]$, $o_\text{x}(C,M)$, and $\mathrm{E}[Y_{1M_0}\mid C]$.

The more robust Ypred.MR is consistent under the following conditions:

Functions

# the nonrobust Ypred estimator
est.Ypred <- estimate_Ypred(
    data        = synth,
    cross.world = "10",  # default

    a.cm.form   = "treat ~ sex + age + edu + religion + drink0 + att0 * att + rul0 * rul + splines::ns(sfc0, 4) * splines::ns(sfc, 4)",

    plot        = TRUE,  # default
    cm.std = c("sfc0", "sfc"),

    y.c0.form   = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)",
    y.c1.form   = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)",
    y10.c.form  = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)",
    y.link      = "logit",

    boot.num    = 99,         # default value is 999
    boot.method = "cont-wt",  # default
    boot.seed   = 77777
)
# the more robust YpredMR estimator
est.YpredMR <- estimate_YpredMR(
    data        = synth,
    cross.world = "10",  # default

    a.c.form    = "treat ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 4)",
    a.cm.form   = "treat ~ sex + age + edu + religion + drink0 + att0 * att + rul0 * rul + splines::ns(sfc0, 4) * splines::ns(sfc, 4)",

    plot        = TRUE,  # default
    c.std  = "sfc0",
    m.std  = "sfc",

    y.c0.form   = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)",
    y.c1.form   = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)",
    y10.c.form  = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)",
    y.link      = "logit",

    boot.num    = 99,         # default value is 999
    boot.method = "cont-wt",  # default
    boot.seed   = 77777
)

Here if the same model form is used for the different outcome given covariates models, the shortcut argument y.c.form can be used instead of y.c0.form, y.c1.form, y10.c.form.

Outputs

The nonrobust Ypred

names(est.Ypred)
names(est.Ypred$plots)
round(est.Ypred$estimates, 4)
gridExtra::grid.arrange(
    est.Ypred$plots$w.wt.distribution + ggtitle("w.wt distribution"),
    est.Ypred$plots$key.balance       + ggtitle("key balance"),
    layout_matrix = rbind(c(1, 2, 2)))

\

The more robust YpredMR

names(est.YpredMR)
names(est.YpredMR$plots)
round(est.YpredMR$estimates, 4)
gridExtra::grid.arrange(
    est.YpredMR$plots$w.wt.distribution + ggtitle("w.wt distribution"),
    est.YpredMR$plots$key.balance       + ggtitle("key balance"),
    est.YpredMR$plots$full.balance      + ggtitle("full balance"),
    layout_matrix = rbind(c(1, 1, 2, 2, 2),
                          c(1, 1, 2, 2, 2),
                          c(3, 3, 3, 3, 3),
                          c(3, 3, 3, 3, 3),
                          c(3, 3, 3, 3, 3)))

\

\

MsimYpred pair

The paper mentioned two versions. Here we implement the first version.

This pair includes the two estimators MsimYpred and MsimYpred.MR.

The nonrobust MsimYpred is consistent if all modeling components are consistent, including $\mathrm{P}(M\mid C,A=0)$, $\mathrm{E}[Y\mid C,M,A=1]$, $\mathrm{E}[Y\mid C,A=1]$, and $\mathrm{E}[Y\mid C,A=0]$.

The more robust MsimYpred.MR is consistent under the following conditions:

Functions

# the nonrobust MsimYpred estimator
est.MsimYpred <- estimate_MsimYpred(
    data        = synth,
    cross.world = "10",

    a.var = 'treat',

    y.c0.form   = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)",
    y.c1.form   = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)",
    y.link      = "logit",

    m.c0.form   = list("att ~ age + sex + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0,3)",
                       "rul ~ age + sex + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0,3) + att",
                       "sfc ~ age + sex + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0,3) + att + rul"),
    m.dist      = list("binary", "binary", "normal"),

    y.cm1.form  = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3) + att + rul + splines::ns(sfc, 3)",

    boot.num    = 99,         # default value is 999
    boot.method = "cont-wt",  # default
    boot.seed   = 77777,

    point.reps  = 500
)
# the more robust MsimYpredMR estimator
est.MsimYpredMR <- estimate_MsimYpredMR(
    data        = synth,
    cross.world = "10",

    a.c.form    = "treat ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 4)",
    a.cm.form   = "treat ~ sex + age + edu + religion + drink0 + att0 * att + rul0 * rul + splines::ns(sfc0, 4) * splines::ns(sfc, 4)",

    y.c0.form   = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)",
    y.c1.form   = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)",
    y.link      = "logit",

    m.c0.form   = list("att ~ age + sex + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0,3)",
                       "rul ~ age + sex + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0,3) + att",
                       "sfc ~ age + sex + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0,3) + att + rul"),
    m.dist      = list("binary", "binary", "normal"),

    y.cm1.form  = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3) + att + rul + splines::ns(sfc, 3)",

    boot.num    = 99,         # default value is 999
    boot.method = "cont-wt",  # default
    boot.seed   = 77777,

    point.reps  = 500
)

Outputs

The nonrobust MsimYpred

est.MsimYpred

The more robust MsimYpredMR

names(est.MsimYpredMR)
names(est.MsimYpredMR$plots)
est.MsimYpredMR$estimates

\

Estimators -- for additive effects only

NDEpred pair

This pair includes two estimators NDEpred and NDEpred.R.

The nonrobust NDEpred is consistent if all modeling components are correctly specified, including the models for $\mathrm{E}[Y\mid C,A=1]$, $\mathrm{E}[Y\mid C,A=0]$, $\mathrm{E}[Y\mid C,M,A=1]$, and $\mathrm{E}[{NDE}_0\mid C]$.

The robust NDEpred.R is consistent under the following conditions

Functions

# the nonrobust NDEpred estimator
est.NDEpred <- estimate_NDEpred(
    data        = synth,
    cross.world = "10",  # default

    a.var       = "treat",

    y.c1.form   = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)",
    y.c0.form   = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)",

    y.cm1.form  = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3) + att + rul + splines::ns(sfc, 3)",
    nde0.c.form = "effect ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)",

    y.link      = "logit",

    boot.num    = 99,         # default value is 999
    boot.method = "cont-wt",  # default
    boot.seed   = 77777
)
# the robust NDEpredR estimator
est.NDEpredR <- estimate_NDEpredR(
    data        = synth,
    cross.world = "10",  # default

    a.c.form    = "treat ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 4)",
    a.cm.form   = "treat ~ sex + age + edu + religion + drink0 + att0 * att + rul0 * rul + splines::ns(sfc0, 4) * splines::ns(sfc, 4)",

    plot        = TRUE,    # default
    c.std  = "sfc0",
    m.std  = "sfc",

    y.c1.form   = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)",
    y.c0.form   = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)",

    y.cm1.form  = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3) + att + rul + splines::ns(sfc, 3)",
    nde0.c.form = "effect ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3)",

    y.link      = "logit",

    boot.num    = 99,         # default value is 999
    boot.method = "cont-wt",  # default
    boot.seed   = 77777
)

Outputs

The nonrobust NDEpred

names(est.NDEpred)
round(est.NDEpred$estimates, 4)

The robust NDEpredR

names(est.NDEpredR)
names(est.NDEpredR$plots)
round(est.NDEpredR$estimates, 4)

\

Estimators -- using covariates to improve precision (not robustness)

wt-Cadj

For this estimation strategy, the paper mentions two versions. We implement the version where the two effects are estimated using the same model.

Function

est.wtCadj <- estimate_wtCadj(
    data        = synth,
    cross.world = "10",

    a.c.form    = "treat ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 4)",
    a.cm.form   = "treat ~ sex + age + edu + religion + drink0 + att0 * att + rul0 * rul + splines::ns(sfc0, 4) * splines::ns(sfc, 4)",

    plot        = TRUE,  # default
    c.std  = "sfc0",
    m.std  = "sfc",

    y.var       = "drink",
    y.link      = "logit",

    boot.num    = 99,
    boot.method = "cont-wt",
    boot.seed   = 77777
) 

Outputs

names(est.wtCadj)
names(est.wtCadj$plots)
est.wtCadj$estimates
est.wtCadj$plots$w.wt.distribution
est.wtCadj$plots$key.balance

These weight distribution and key balance plots are exactly the same as what we have for the pure weighting estimator.

\

wp-Cadj pair

The paper mentions two versions. We implement the version where the two effects are estimated separately.

This pair includes two estimators wp-Cadj and wp.MR-Cadj.

The nonrobust wp-Cadj is consistent if all modeling components except the working regression model are correctly specified, including $\omega_0(C)$, $\omega_1(C)$, and $\mathrm{E}[Y\mid C,M,A=1]$.

The (slightly) more robust wp.MR-Cadj is consistent if $\omega_0(C)$ and $\omega_1(C)$ are correct, and either $\omega_\text{x}(C,M)$ or $\mathrm{E}[Y\mid C,M,A=1]$ is correct.

Functions

# the nonrobust wpCadj estimator
est.wpCadj <- estimate_wpCadj(
    data        = synth,
    cross.world = "10",

    a.c.form    = "treat ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 4)",

    plot        = TRUE,
    c.std       = "sfc0",

    y.cm1.form  = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3) + att + rul + splines::ns(sfc, 3)",
    y.cm0.form  = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3) + att + rul + splines::ns(sfc, 3)",
    y.link      = "logit",

    boot.num    = 99,
    boot.method = "cont-wt",
    boot.seed   = 77777
)
# the more robust wpMRCadj estimator
est.wpMRCadj <- estimate_wpMRCadj(
    data        = synth,
    cross.world = "10",

    a.c.form    = "treat ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 4)",
    a.cm.form   = "treat ~ sex + age + edu + religion + drink0 + att0 * att + rul0 * rul + splines::ns(sfc0, 4) * splines::ns(sfc, 4)",

    plot        = TRUE,
    c.std       = "sfc0",
    m.std       = "sfc",

    y.cm1.form  = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3) + att + rul + splines::ns(sfc, 3)",
    y.cm0.form  = "drink ~ sex + age + edu + religion + drink0 + att0 + rul0 + splines::ns(sfc0, 3) + att + rul + splines::ns(sfc, 3)",
    y.link      = "logit",

    boot.num    = 99,
    boot.method = "cont-wt",
    boot.seed   = 77777
)

Outputs

The nonrobust wpCadj

names(est.wpCadj)
names(est.wpCadj$plots)
est.wpCadj$estimates
est.wpCadj$plots$w.wt.distribution
est.wpCadj$plots$key.balance

The more robust wpMRCadj

names(est.wpMRCadj)
names(est.wpMRCadj$plots)
est.wpMRCadj$estimates
gridExtra::grid.arrange(
    est.wpMRCadj$plots$w.wt.distribution + ggtitle("w.wt distribution"),
    est.wpMRCadj$plots$key.balance       + ggtitle("key balance"),
    est.wpMRCadj$plots$full.balance      + ggtitle("full balance"),
    layout_matrix = rbind(c(1, 1, 1, NA, NA),
                          c(1, 1, 1, NA, NA),
                          c(2, 2, 2, 2, 2),
                          c(2, 2, 2, 2, 2),
                          c(3, 3, 3, 3, 3),
                          c(3, 3, 3, 3, 3),
                          c(3, 3, 3, 3, 3)))

\

All estimates

Here we want to show all the estimates from all the estimators. Let's collect all the results from the different estimators into a list.

est.names <- c("wtd", 
               "Y2pred", "Y2predR",
               "psYpred", "psYpredMR",
               "Ypred", "YpredMR",
               "MsimYpred", "MsimYpredMR",
               "NDEpred", "NDEpredR",
               "wtCadj",
               "wpCadj", "wpMRCadj")
estimates <- mget(paste0("est.", est.names))
names(estimates) <- est.names
# This is a boot.num workaround.
# to obtain useful confidence intervals, 
# change boot.num to 999 and rerun all estimation chunks plus the above chunk,
# then run this to save the results.
# attn: you don't need to do those, because this has previously been done,
# so the file here::here("vignettes", "results", "estimates.rds") has already been saved.

saveRDS(estimates, here::here("vignettes", "results", "estimates.rds"))
# grab the results with the confidence intervals based on boot.num = 999
estimates <- readRDS(here::here("vignettes", "results", "estimates.rds"))

We process the list into a data frame with relevant information.

estimates <- lapply(est.names, function(z) {
    tmp <- estimates[[z]]$estimates
    tmp <- tmp[c("NDE0", "NIE1"), c("estimate", "2.5%", "97.5%")]
    colnames(tmp) <- c("estimate", "lb", "ub")
    est <- as.data.frame(tmp, row.names = FALSE)
    est$effect <- rownames(tmp)
    est$estimator <- z
    est
})

estimates <- do.call(rbind, estimates)

estimates$estimator <- factor(estimates$estimator, levels = est.names)

estimates$properties <- 
    ifelse(estimates$estimator %in% c("wtd", 
                                      "Y2pred", "psYpred", "Ypred", "MsimYpred", 
                                      "NDEpred", 
                                      "wtCadj", "wpCadj"),
           "non-robust",
           ifelse(estimates$estimator %in% c("psYpredMR", "YpredMR", "MsimYpredMR", 
                                             "wpMRCadj"),
                  "more robust",
                  "robust"))
estimates$properties = factor(estimates$properties, 
                              levels = c("non-robust", "more robust", "robust"))

head(estimates)

Now we are ready to plot.

ggplot(data = estimates, 
       aes(x = estimator, 
           # flip the sign of y, as we will present effects as reductions
           y = -estimate, ymin = -ub, ymax = -lb, 
           color = properties, shape = properties)) +
    geom_hline(yintercept = 0, color = "gray") +
    geom_linerange() +
    geom_point(aes(size = properties)) +
    scale_color_manual(values = c("gray50", "blue", "purple")) +
    scale_shape_manual(values = c(17, 18, 19)) +
    scale_size_manual(values = c(3, 4.5, 3.7)) +
    scale_x_discrete(limits = rev, position = "top") +
    coord_flip() +
    labs(x = "", y = "estimate and 95% confidence interval") +
    facet_wrap(~ effect) +
    theme_bw() +
    theme(legend.position = "left")

This plot shows, for all the estimators, the $\text{NDE}_0$ and $\text{NIE}_1$ point estimates and with 95\% confidence intervals. These effects are in terms of reductions in weekly drinking prevalence at 22 months.

But wait! Should we trust these confidence intervals with boot.num = 99?

Well, actually, the results here are based on boot.num = 999. (This is the plot in Figure 9 in the manuscript.) We had swapped out the toy results from previous sections for the serious results here using some hidden code right before making the plot. The code is shown in the .Rmd file used to compile this vignette.

pdf(here::here("vignettes", "results", "fig9.pdf"), width = 7, height = 4.5)
ggplot(data = estimates, 
       aes(x = estimator, 
           # flip the sign of y, as we will present effects as reductions
           y = -estimate, ymin = -ub, ymax = -lb, 
           color = properties, shape = properties)) +
    geom_hline(yintercept = 0, color = "gray") +
    geom_linerange() +
    geom_point(aes(size = properties)) +
    scale_color_manual(values = c("gray50", "blue", "purple")) +
    scale_shape_manual(values = c(17, 18, 19)) +
    scale_size_manual(values = c(3, 4.5, 3.7)) +
    scale_x_discrete(limits = rev, position = "top") +
    coord_flip() +
    labs(x = "", y = "estimate and 95% confidence interval") +
    facet_wrap(~ effect) +
    theme_bw() +
    theme(legend.position = "left")
dev.off()

\

References {-}



trangnguyen74/mediationClarity documentation built on May 4, 2023, 6:22 a.m.