Nothing
#' Generate Balancing Weights for Longitudinal Treatments
#'
#' @description
#' `weightitMSM()` allows for the easy generation of balancing
#' weights for marginal structural models for time-varying treatments using a
#' variety of available methods for binary, continuous, and multi-category
#' treatments. Many of these methods exist in other packages, which [weightit()]
#' calls; these packages must be installed to use the desired method.
#'
#' @inheritParams weightit
#' @param formula.list a list of formulas corresponding to each time point with
#' the time-specific treatment variable on the left hand side and
#' pre-treatment covariates to be balanced on the right hand side. The
#' formulas must be in temporal order, and must contain all covariates to be
#' balanced at that time point (i.e., treatments and covariates featured in
#' early formulas should appear in later ones). Interactions and functions of
#' covariates are allowed.
#' @param data an optional data set in the form of a data frame that contains
#' the variables in the formulas in `formula.list`. This must be a wide data
#' set with exactly one row per unit.
#' @param method a string of length 1 containing the name of the method that
#' will be used to estimate weights. See [weightit()] for allowable options.
#' The default is `"glm"`, which estimates the weights using generalized
#' linear models.
#' @param stabilize `logical`; whether or not to stabilize the weights.
#' Stabilizing the weights involves fitting a model predicting treatment at
#' each time point from treatment status at prior time points. If `TRUE`, a
#' fully saturated model will be fit (i.e., all interactions between all
#' treatments up to each time point), essentially using the observed treatment
#' probabilities in the numerator (for binary and multi-category treatments).
#' This may yield an error if some combinations are not observed. Default is
#' `FALSE`. To manually specify stabilization model formulas, e.g., to specify
#' non-saturated models, use `num.formula`. With many time points, saturated
#' models may be time-consuming or impossible to fit.
#' @param num.formula optional; a one-sided formula with the stabilization
#' factors (other than the previous treatments) on the right hand side, which
#' adds, for each time point, the stabilization factors to a model saturated
#' with previous treatments. See Cole & Hernán (2008) for a discussion of how
#' to specify this model; including stabilization factors can change the
#' estimand without proper adjustment, and should be done with caution. Can
#' also be a list of one-sided formulas, one for each time point. Unless you
#' know what you are doing, we recommend setting `stabilize = TRUE` and
#' ignoring `num.formula`.
#' @param include.obj `logical`; whether to include in the output a list of the fit objects
#' created in the process of estimating the weights at each time point. For
#' example, with `method = "glm"`, a list of the `glm` objects containing the
#' propensity score models at each time point will be included. See the help
#' pages for each method for information on what object will be included if
#' `TRUE`.
#' @param is.MSM.method `logical`; whether the method estimates weights for multiple time
#' points all at once (`TRUE`) or by estimating weights at each time point and
#' then multiplying them together (`FALSE`). This is only relevant for
#' user-specified functions.
#' @param weightit.force `logical`; several methods are not valid for estimating weights
#' with longitudinal treatments, and will produce an error message if
#' attempted. Set to `TRUE` to bypass this error message.
#' @param ... other arguments for functions called by `weightit()` that control
#' aspects of fitting that are not covered by the above arguments. See Details
#' at [weightit()].
#'
#' @returns
#' A `weightitMSM` object with the following elements:
#' \item{weights}{The estimated weights, one for each unit.}
#' \item{treat.list}{A list of the values of the time-varying treatment variables.}
#' \item{covs.list}{A list of the covariates used in the fitting at each time point. Only includes the raw covariates, which may have been altered in the fitting process.}
#' \item{data}{The data.frame originally entered to `weightitMSM()`.}
#' \item{estimand}{"ATE", currently the only estimand for MSMs with binary or multi-category treatments.}
#' \item{method}{The weight estimation method specified.}
#' \item{ps.list}{A list of the estimated propensity scores (if any) at each time point.}
#' \item{s.weights}{The provided sampling weights.}
#' \item{by}{A data.frame containing the `by` variable when specified.}
#' \item{stabilization}{The stabilization factors, if any.}
#'
#' When `keep.mparts` is `TRUE` (the default) and the chosen method is
#' compatible with M-estimation, the components related to M-estimation for use
#' in [glm_weightit()] are stored in the `"Mparts.list"` attribute. When `by` is
#' specified, `keep.mparts` is set to `FALSE`.
#'
#' @details
#' Currently only "wide" data sets, where each row corresponds to a
#' unit's entire variable history, are supported. You can use [reshape()] or
#' other functions to transform your data into this format; see example below.
#'
#' In general, `weightitMSM()` works by separating the estimation of weights
#' into separate procedures for each time period based on the formulas provided.
#' For each formula, `weightitMSM()` simply calls `weightit()` to that formula,
#' collects the weights for each time period, and multiplies them together to
#' arrive at longitudinal balancing weights.
#'
#' Each formula should contain all the covariates to be balanced on. For
#' example, the formula corresponding to the second time period should contain
#' all the baseline covariates, the treatment variable at the first time period,
#' and the time-varying covariates that took on values after the first treatment
#' and before the second. Currently, only wide data sets are supported, where
#' each unit is represented by exactly one row that contains the covariate and
#' treatment history encoded in separate variables.
#'
#' The `"cbps"` method, which calls `CBPS()` in \pkg{CBPS}, will yield different
#' results from `CBMSM()` in \pkg{CBPS} because `CBMSM()` takes a different
#' approach to generating weights than simply estimating several time-specific
#' models.
#'
#' @seealso
#' [weightit()] for information on the allowable methods
#'
#' [summary.weightitMSM()] for summarizing the weights
#'
#' @references
#' Cole, S. R., & Hernán, M. A. (2008). Constructing Inverse Probability Weights for Marginal Structural Models. *American Journal of Epidemiology*, 168(6), 656–664. \doi{10.1093/aje/kwn164}
#'
#' @examples
#' data("msmdata")
#' (W1 <- weightitMSM(list(A_1 ~ X1_0 + X2_0,
#' A_2 ~ X1_1 + X2_1 +
#' A_1 + X1_0 + X2_0,
#' A_3 ~ X1_2 + X2_2 +
#' A_2 + X1_1 + X2_1 +
#' A_1 + X1_0 + X2_0),
#' data = msmdata,
#' method = "glm"))
#' summary(W1)
#' cobalt::bal.tab(W1)
#'
#' # Using stabilization factors
#' W2 <- weightitMSM(list(A_1 ~ X1_0 + X2_0,
#' A_2 ~ X1_1 + X2_1 +
#' A_1 + X1_0 + X2_0,
#' A_3 ~ X1_2 + X2_2 +
#' A_2 + X1_1 + X2_1 +
#' A_1 + X1_0 + X2_0),
#' data = msmdata,
#' method = "glm",
#' stabilize = TRUE,
#' num.formula = list(~ 1,
#' ~ A_1,
#' ~ A_1 + A_2))
#'
#' # Same as above but with fully saturated stabilization factors
#' # (i.e., making the last entry in 'num.formula' A_1*A_2)
#' W3 <- weightitMSM(list(A_1 ~ X1_0 + X2_0,
#' A_2 ~ X1_1 + X2_1 +
#' A_1 + X1_0 + X2_0,
#' A_3 ~ X1_2 + X2_2 +
#' A_2 + X1_1 + X2_1 +
#' A_1 + X1_0 + X2_0),
#' data = msmdata,
#' method = "glm",
#' stabilize = TRUE)
#' @export
weightitMSM <- function(formula.list, data = NULL, method = "glm",
stabilize = FALSE, by = NULL,
s.weights = NULL, num.formula = NULL, missing = NULL, verbose = FALSE,
include.obj = FALSE, keep.mparts = TRUE,
is.MSM.method, weightit.force = FALSE, ...) {
A <- list(...)
call <- match.call()
## Checks and processing ----
#Checks
##Process method
.check_acceptable_method(method, msm = TRUE, force = weightit.force)
if (is_null(method)) {
method <- NULL
is.MSM.method <- TRUE
}
else if (is.character(method)) {
method <- .method_to_proper_method(method)
attr(method, "name") <- method
if (missing(is.MSM.method)) is.MSM.method <- NULL
is.MSM.method <- .process_MSM_method(is.MSM.method, method)
}
else { #function
method.name <- paste(deparse(substitute(method)))
.check_user_method(method)
if (missing(is.MSM.method)) is.MSM.method <- NULL
is.MSM.method <- .process_MSM_method(is.MSM.method, method)
attr(method, "name") <- method.name
}
##Process by
if (is_not_null(A[["exact"]])) {
.wrn("`by` has replaced `exact` in the `weightit()` syntax, but `exact` will always work")
by <- A[["exact"]]
by.arg <- "exact"
}
else {
by.arg <- "by"
}
reported.covs.list <- simple.covs.list <- covs.list <- treat.list <- w.list <- ps.list <-
stabout <- sw.list <- Mparts.list <- stab.Mparts.list <- make_list(length(formula.list))
if (is_null(formula.list) || !is.list(formula.list) ||
!all_apply(formula.list, rlang::is_formula, lhs = TRUE)) {
.err("`formula.list` must be a list of formulas")
}
for (i in seq_along(formula.list)) {
#Process treat and covs from formula and data
t.c <- get_covs_and_treat_from_formula2(formula.list[[i]], data)
simple.covs.list[[i]] <- t.c[["simple.covs"]]
reported.covs.list[[i]] <- t.c[["reported.covs"]]
covs.list[[i]] <- t.c[["model.covs"]]
treat.list[[i]] <- t.c[["treat"]]
if (is_null(covs.list[[i]])) {
.err(sprintf("no covariates were specified in the %s formula", ordinal(i)))
}
if (is_null(treat.list[[i]])) {
.err(sprintf("no treatment variable was specified in the %s formula", ordinal(i)))
}
n <- length(treat.list[[i]])
if (nrow(covs.list[[i]]) != n) {
.err("treatment and covariates must have the same number of units")
}
treat.list[[i]] <- as.treat(treat.list[[i]], process = TRUE)
treat.name <- .attr(treat.list[[i]], "treat.name")
if (anyNA(treat.list[[i]]) || !all(is.finite(treat.list[[i]]))) {
.err(sprintf("no missing or non-finite values are allowed in the treatment variable. Missing or non-finite values found in %s",
treat.name))
}
names(treat.list)[i] <- treat.name
names(reported.covs.list)[i] <- treat.name
if (!is.MSM.method) {
.check_method_treat.type(method, get_treat_type(treat.list[[i]]))
}
#By is processed each for each time to check, but only last time is used for by.factor.
processed.by <- .process_by(by, data = data,
treat = treat.list[[i]],
treat.name = treat.name,
by.arg = by.arg)
}
#Process missing
missing <- {
if (is_null(method) || !any_apply(reported.covs.list, anyNA)) ""
else .process_missing(missing, method)
}
#Process s.weights
s.weights <- .process.s.weights(s.weights, data)
if (is_null(s.weights)) s.weights <- rep.int(1, n)
else .check_method_s.weights(method, s.weights)
if (is_null(method)) {
num.formula <- NULL
stabilize <- FALSE
}
else if (is_not_null(num.formula)) {
if (!isTRUE(stabilize)) {
.msg("setting `stabilize` to `TRUE` based on `num.formula` input")
}
stabilize <- TRUE
}
if (stabilize) {
if (!is.function(method) && !.weightit_methods[[method]]$stabilize_ok) {
.wrn(sprintf("`stabilize` cannot be used with %s and will be ignored",
.method_to_phrase(method)))
stabilize <- FALSE
num.formula <- NULL
}
else if (is_not_null(num.formula)) {
.check_num.formula(num.formula, data, env = parent.frame(),
formula.list = formula.list)
}
}
#Process moments and int
m.i.q <- .process_moments_int_quantile(method = method, ...)
A["s.weights"] <- list(s.weights)
A["by.factor"] <- list(.attr(processed.by, "by.factor"))
A["method"] <- list(method)
A[c("moments", "int", "quantile")] <- m.i.q[c("moments", "int", "quantile")]
A["subclass"] <- list(numeric())
A["missing"] <- list(missing)
A["verbose"] <- list(verbose)
A["include.obj"] <- list(include.obj)
if (is.MSM.method) {
#Returns weights (w)
A["covs.list"] <- list(covs.list)
A["treat.list"] <- list(treat.list)
A["stabilize"] <- list(stabilize)
obj <- do.call("weightitMSM.fit", A)
w <- obj[["weights"]]
stabout <- NULL
obj.list <- obj[["fit.obj"]]
Mparts.list <- .attr(obj, "Mparts")
}
else {
if (is_not_null(A[["link"]])) {
if (length(A[["link"]]) == 1L) {
A[["link"]] <- rep.int(A[["link"]], length(formula.list))
}
else if (length(A[["link"]]) != length(formula.list)) {
.err(sprintf("the argument to `link` must have length 1 or %s", length(formula.list)))
}
}
obj.list <- make_list(length(formula.list))
A["estimand"] <- list("ATE")
A["focal"] <- list(character())
A["stabilize"] <- list(FALSE)
A["ps"] <- list(numeric())
A["is.MSM.method"] <- list(FALSE)
for (i in seq_along(formula.list)) {
A_i <- A
if (length(A[["link"]]) == length(formula.list)) {
A_i["link"] <- list(A[["link"]][[i]])
}
A_i["covs"] <- list(covs.list[[i]])
A_i["treat"] <- list(treat.list[[i]])
A_i["treat.type"] <- list(get_treat_type(treat.list[[i]]))
A_i[".data"] <- list(data)
A_i[".covs"] <- list(reported.covs.list[[i]])
## Running models ----
#Returns weights (w) and propensity score (ps)
obj <- do.call("weightit.fit", A_i)
w.list[i] <- list(obj[["weights"]])
ps.list[i] <- list(obj[["ps"]])
obj.list[i] <- list(obj[["fit.obj"]])
Mparts.list[i] <- list(.attr(obj, "Mparts"))
if (stabilize) {
#Process stabilization formulas and get stab weights
if (rlang::is_formula(num.formula)) {
if (i == 1L) {
stab.f <- update(formula.list[[i]], num.formula)
# stab.f <- update.formula(as.formula(paste(names(treat.list)[i], "~ 1")),
# as.formula(paste(paste(num.formula, collapse = ""), "+ .")))
}
else {
stab.f <- update.formula(as.formula(paste(names(treat.list)[i], "~",
paste(names(treat.list)[seq_along(names(treat.list)) < i],
collapse = " * "))),
as.formula(paste(num.formula, "+ .")))
}
}
else if (is.list(num.formula)) {
stab.f <- update(formula.list[[i]], num.formula[[i]])
# stab.f <- update.formula(as.formula(paste(names(treat.list)[i], "~ 1")),
# as.formula(paste(paste(num.formula[[i]], collapse = ""), "+ .")))
}
else {
if (i == 1L) {
stab.f <- update(formula.list[[i]], ". ~ 1")
}
else {
stab.f <- update(formula.list[[i]],
sprintf(". ~ %s", paste(names(treat.list)[seq_along(treat.list) < i],
collapse = " * ")))
}
}
stab.t.c_i <- get_covs_and_treat_from_formula2(stab.f, data)
A_i["covs"] <- stab.t.c_i["model.covs"]
A_i["method"] <- list("glm")
A_i["moments"] <- list(integer())
A_i["int"] <- list(FALSE)
A_i["quantile"] <- list(list())
sw_obj <- do.call("weightit.fit", A_i)
sw.list[[i]] <- 1 / sw_obj[["weights"]]
stabout[[i]] <- stab.f[-2L]
stab.Mparts.list[i] <- list(.attr(sw_obj, "Mparts"))
if (is_not_null(stab.Mparts.list[[i]])) {
#Invert wfun and compute derivative of inverted wfun
.wfun <- stab.Mparts.list[[i]]$wfun
stab.Mparts.list[[i]]$wfun <- Invert(.wfun)
if (is_not_null(stab.Mparts.list[[i]]$dw_dBtreat)) {
.dw_dBtreat <- stab.Mparts.list[[i]]$dw_dBtreat
stab.Mparts.list[[i]]$dw_dBtreat <- function(Btreat, Xtreat, A, SW) {
-.dw_dBtreat(Btreat, Xtreat, A, SW) / .wfun(Btreat, Xtreat, A)^2
}
}
}
}
}
w <- Reduce("*", w.list, init = 1)
if (stabilize) {
w <- Reduce("*", sw.list, init = w)
unique.stabout <- unique(stabout)
if (length(unique.stabout) <= 1L) {
stabout <- unique.stabout
}
}
else {
stabout <- NULL
}
if (include.obj) {
names(obj.list) <- names(treat.list)
}
}
if (is_not_null(method) && all_the_same(w)) {
.wrn(sprintf("all weights are %s, possibly indicating an estimation failure", w[1L]))
}
## Assemble output object----
out <- list(weights = w,
treat.list = treat.list,
covs.list = simple.covs.list,
estimand = "ATE",
method = method,
s.weights = s.weights,
by = processed.by,
call = call,
formula.list = formula.list,
stabilization = stabout,
missing = if (nzchar(missing)) missing else NULL,
env = parent.frame(),
obj = obj.list
)
out <- clear_null(out)
if (keep.mparts && all(lengths(Mparts.list) > 0L)) {
attr(out, "Mparts.list") <- clear_null(c(Mparts.list, stab.Mparts.list))
}
class(out) <- c("weightitMSM", "weightit")
out
}
#' @exportS3Method print weightitMSM
print.weightitMSM <- function(x, ...) {
treat.types <- vapply(x[["treat.list"]], get_treat_type, character(1L))
cat(sprintf("A %s object\n", .it(class(x)[1L])))
if (is_not_null(x[["method"]])) {
method_name <- {
if (is_not_null(.attr(x[["method"]], "name"))) add_quotes(.attr(x[["method"]], "name"))
else if (is.character(x[["method"]])) add_quotes(x[["method"]])
else "user-defined"
}
method_note <- {
if (is_not_null(.attr(x[["method"]], "package")))
sprintf(" (converted from %s)", .it(.attr(x[["method"]], "package")))
else if (is_not_null(x[["method"]]))
sprintf(" (%s)", .method_to_phrase(x[["method"]]))
else
""
}
cat(sprintf(" - method: %s%s\n",
method_name,
method_note))
}
else if (all_the_same(x[["weights"]])) {
cat(" - method: no weighting\n")
}
cat(sprintf(" - number of obs.: %s\n",
nobs(x)))
cat(sprintf(" - sampling weights: %s\n",
if (is_null(x[["s.weights"]]) || all_the_same(x[["s.weights"]])) "none" else "present"))
cat(sprintf(" - number of time points: %s (%s)\n",
length(x[["treat.list"]]),
word_list(names(x[["treat.list"]]), and.or = FALSE)))
cat(" - treatment:\n")
for (i in seq_along(x[["treat.list"]])) {
cat(sprintf(" + time %s: %s\n",
i,
switch(treat.types[i],
continuous = "continuous",
`multi-category` =,
multinomial = sprintf("%s-category (%s)",
nunique(x[["treat.list"]][[i]]),
word_list(levels(x[["treat.list"]][[i]]), and.or = FALSE)),
binary = "2-category")))
}
if (is_not_null(x[["covs.list"]])) {
cat(" - covariates:\n")
for (i in seq_along(x[["covs.list"]])) {
if (i == 1L) {
cat(sprintf(" + baseline: %s\n",
if (is_null(x$covs.list[[i]])) "(none)"
else word_list(names(x$covs.list[[i]]), and.or = FALSE)))
}
else {
cat(sprintf(" + after time %s: %s\n",
i - 1L,
word_list(names(x$covs.list[[i]]), and.or = FALSE)))
}
}
}
if (is_not_null(x[["missing"]]) && !identical(x[["missing"]], "")) {
cat(sprintf(" - missingness method: %s\n",
.missing_to_phrase(x[["missing"]])))
}
if (is_not_null(x[["by"]])) {
cat(sprintf(" - by: %s\n",
word_list(names(x[["by"]]), and.or = FALSE)))
}
if (is_not_null(x$stabilization)) {
cat(" - stabilized")
if (any_apply(x$stabilization, function(s) is_not_null(all.vars(s)))) {
cat(paste0("; stabilization factors:\n",
if (length(x$stabilization) == 1L) {
sprintf(" %s", word_list(.attr(terms(x[["stabilization"]][[1L]]), "term.labels"),
and.or = FALSE))
}
else {
paste(vapply(seq_along(x$stabilization), function(i) {
if (i == 1L) {
sprintf(" + baseline: %s",
if (is_null(.attr(terms(x[["stabilization"]][[i]]), "term.labels"))) "(none)"
else word_list(.attr(terms(x[["stabilization"]][[i]]), "term.labels"), and.or = FALSE))
}
else {
sprintf(" + after time %s: %s",
i - 1L,
word_list(.attr(terms(x[["stabilization"]][[i]]), "term.labels"), and.or = FALSE))
}
}, character(1L)), collapse = "\n")
}))
}
}
trim <- .attr(x[["weights"]], "trim")
if (is_not_null(trim)) {
if (trim < 1) {
if (.attr(x[["weights"]], "trim.lower")) {
trim <- c(1 - trim, trim)
}
cat(sprintf(" - weights trimmed at %s\n",
word_list(paste0(round(100 * trim, 2L), "%"))))
}
else {
cat(sprintf(" - weights trimmed at the %s %s\n",
if (.attr(x[["weights"]], "trim.lower")) "top and bottom"
else "top",
trim))
}
}
invisible(x)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.