knitr::opts_chunk$set(message = FALSE, fig.width=5) if (any(!sapply(c("WeightIt", "twang"), requireNamespace, quietly = TRUE))) knitr::opts_chunk$set(eval = FALSE)
This is an introduction to the use of
cobalt with longitudinal treatments. These occur when there are multiple treatment periods spaced over time, with the potential for time-dependent confounding to occur. A common way to estimate treatment effects in these scenarios is to use marginal structural models (MSM), weighted by balancing weights. The goal of applying weights is to simulate a sequential randomization design, where the probability of being assigned to treatment at each time point is independent of each unit's prior covariate and treatment history. For introduction to MSMs in general, see @thoemmesPrimerInverseProbability2016, @vanderweeleCausalInferenceLongitudinal2016, @coleConstructingInverseProbability2008, or @robinsMarginalStructuralModels2000. The key issue addressed by this guide and
cobalt in general is assessing balance before each treatment period to ensure the removal of confounding.
In preprocessing for MSMs, three types of variables are relevant: baseline covariates, treatments, and intermediate outcomes/time-varying covariates. The goal of balance assessment is to assess whether after preprocessing, the resulting sample is one in which each treatment is independent of baseline covariates, treatment history, and time-varying covariates. The tools in
cobalt have been developed to satisfy these goals.
The next section describe how to use
cobalt's tools to assess balance with longitudinal treatments. First, we'll examine an example data set and identify some tools that can be used to generate weights for MSMs. Next we'll use
love.plot() to assess and present balance.
We're going to use the
iptwExWide data set in the
library("cobalt") data("iptwExWide", package = "twang") head(iptwExWide)
We have the variables
outcome, which is the outcome,
use0, which are the baseline covariates,
use2, which are time-varying covariates measured after treatment periods 1 and 2, and
tx3, which are the treatments at each of the three treatment periods.
The goal of balance assessment in this scenario is to ensure the following:
tx1 is independent from
tx2 is independent from
tx3 is independent from
To estimate the weights, we'll use
WeightIt to fit a series of logistic regressions that generate the weights. See the
WeightIt documentation for more information on how to use
WeightIt with longitudinal treatments.
Wmsm <- WeightIt::weightitMSM( list(tx1 ~ use0 + gender + age, tx2 ~ use0 + gender + age + use1 + tx1, tx3 ~ use0 + gender + age + use1 + tx1 + use2 + tx2), data = iptwExWide, method = "ps")
Next we'll use
bal.tab() to examine balance before and after applying the weights.
To examine balance on the original data, we can specify the treatment-covariate relationship we want to assess by using either the formula or data frame interfaces to
bal.tab(). The formula interface requires a list of formulas, one for each treatment, and a data set containing the relevant variables. The data set must be in the "wide" setup, where each time point receives its own columns and each unit has exactly one row of data. The formula interface is similar to the
WeightIt input seen above. The data frame interface requires a list of treatment values for each time point and a data frame or list of covariates for each time point. We'll use the data frame interface here.
bal.tab(list(iptwExWide[c("use0", "gender", "age")], iptwExWide[c("use0", "gender", "age", "use1", "tx1")], iptwExWide[c("use0", "gender", "age", "use1", "tx1", "use2", "tx2")]), treat.list = iptwExWide[c("tx1", "tx2", "tx3")])
Here we see a summary of balance across all time points. This displays each variable, how many times it appears in balance tables, its type, and the greatest imbalance for that variable across all time points. Below this is a summary of sample sizes across time points. To request balance on individual time points, we can use the
which.time argument, which can be set to one or more numbers or
.none (the default). Below we'll request balance on all time points by setting
which.time = .all. Doing so hides the balance summary across time points, but this can be requested again by setting
msm.summary = TRUE.
bal.tab(list(iptwExWide[c("use0", "gender", "age")], iptwExWide[c("use0", "gender", "age", "use1", "tx1")], iptwExWide[c("use0", "gender", "age", "use1", "tx1", "use2", "tx2")]), treat.list = iptwExWide[c("tx1", "tx2", "tx3")], which.time = .all)
Here we see balance by time point. At each time point, a
bal.tab object is produced for that time point. These function just like regular
This output will appear no matter what the treatment types are (i.e., binary, continuous, multi-category), but for multi-category treatments or when the treatment types vary or for multiply imputed data, no balance summary will be computed or displayed.
We can use
bal.tab() with the
weightitMSM object generated above. Setting
un = TRUE would produce balance statistics before adjustment, like we did before. We'll set
which.time = .all and
msm.summary = TRUE to see balance for each time point and across time points.
bal.tab(Wmsm, un = TRUE, which.time = .all, msm.summary = TRUE)
Note that to add covariates, we must use
addl.list (which can be abbreviated as
addl), which functions like
addl in point treatments. The input to
addl.list must be a list of covariates for each time point, or a single data data frame of variables to be assessed at all time points. The same goes for adding distance variables, which must be done with
distance.list (which can be abbreviated as
Next we'll use
bal.plot() to more finely examine covariate balance.
We can compare distributions of covariates across treatment groups for each time point using
bal.plot(), just as we could with point treatments.
bal.plot(Wmsm, var.name = "age", which = "both")
Balance for variables that only appear in certain time points will only be displayed at those time points:
bal.plot(Wmsm, var.name = "tx1", which = "both")
which.time can be specified to limit output to chosen time points.
Finally, we'll examine using
love.plot() with longitudinal treatments to display balance for presentation.
love.plot() works with longitudinal treatments just as it does with point treatments, except that the user can choose whether to display separate plots for each time point or one plot with the summary across time points. As with
bal.tab(), the user can set
which.time to display only certain time points. When set to
.none, the summary across time points is displayed. The
agg.fun argument is set to
"max" by default.
love.plot(Wmsm, abs = TRUE)
love.plot(Wmsm, which.time = .none)
Here we used
WeightIt to generate our MSM weights, but
cobalt is compatible with other packages for longitudinal treatments as well.
CBMSM objects from the
CBPS package and
iptw objects from the
twang package can be used in place of the
weightitMSM object in the above examples. In addition, users who have generated balancing weights outside any of these package can specify an argument to
bal.tab() with the formula or data frame methods to assess balance using those weights, or they can use the default method of
bal.tab() to supply an object containing any of the objects required for balance assessment (output from
optweight is particularly well suited for this).
CBPS estimates and assesses balance on MSM weights differently from
cobalt. Its focus is on ensuring balance across all treatment history permutations, whereas
cobalt focuses on evaluating the similarity to sequential randomization. For this reason, it may appear that
CBMSM objects have different balance qualities as measured by the two packages.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.