knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = rlang::is_installed("emmeans") ) library(brms.mmrm) library(dplyr) library(tidyr) library(zoo)
This vignette explains how brms.mmrm
conducts posterior inference on a fitted MMRM model using estimated marginal means.
Throughout this vignette, we use the mmrm
package's fev_data
dataset, a simulation of a clinical trial in which chronic obstructive pulmonary disease (COPD) patients (variable USUBJID
) were randomized to different treatment groups (variable ARMCD
) and measured across four discrete time points (variable AVISIT
). The given response variable is forced expired volume in one second (FEV1
), and we are interested in the FEV1
change from baseline to each time point (derived variable FEV_CHG
). For this vignette, we impute missing responses in order to simplify the discussion.
library(brms.mmrm) library(dplyr) library(tidyr) data(fev_data, package = "mmrm") data <- fev_data |> group_by(USUBJID) |> complete(AVISIT) |> arrange(AVISIT) |> fill( any_of(c("ARMCD", "FEV1_BL", "RACE", "SEX", "WEIGHT")), .direction = "downup" ) |> mutate(FEV1 = na.locf(FEV1, na.rm = FALSE)) |> mutate(FEV1 = na.locf(FEV1, na.rm = FALSE, fromLast = TRUE)) |> ungroup() |> filter(!is.na(FEV1)) |> mutate(FEV1_CHG = FEV1 - FEV1_BL, USUBJID = as.character(USUBJID)) |> select(-FEV1) |> as_tibble() |> arrange(USUBJID, AVISIT) |> brm_data( outcome = "FEV1_CHG", baseline = "FEV1_BL", group = "ARMCD", patient = "USUBJID", time = "AVISIT", covariates = c("RACE", "SEX", "WEIGHT"), reference_group = "PBO", reference_time = "VIS1" )
data
According to @lenth2016, marginal means (formerly "least-squares means") are predictions (usually averaged predictions) at each point in a reference grid. The reference grid declares combinations of levels of factors of interest. In a clinical trial with repeated measures, we are often interested in the mean response at each combination of treatment group and discrete time point. For our FEV1 dataset, we are interested in the mean of FEV1_CHG
and its standard error for each combination of treatment group and time point.^[For a subgroup analysis where the subgroup factor is identified in advance, we would be interested in each combination of treatment group, subgroup level, and time point.] In other words, we want to estimate the mean FEV1_CHG
for group "TRT"
time "VIS1"
, the mean FEV1_CHG
for group "TRT"
time "VIS2"
, and so on.^[Other quantities of interest are downstream of these marginal means. For example, the difference between TRT
and PBO
at time point VIS4
, and the difference between VIS4
and VIS4
for the TRT
group, are simple contrasts of the upstream marginal means.] We represent our goals in a reference with one row per marginal mean of interest and columns with the levels of the factors of interest.
reference_grid <- distinct(data, ARMCD, AVISIT) reference_grid
It is seldom trivial to estimate marginal means. For example, the following parameterization includes an intercept term, additive terms for each level of each factor, interactions to capture non-additive relationships among factors, continuous covariates, and different FEV1_BL
slopes for different time points. Here, there is no model coefficient that directly corresponds to a marginal mean of interest. Even terms like AVISITVIS2:ARMCDTRT
implicitly condition on a subset of the data because of the other variables involved.
brms_mmrm_formula <- brm_formula(data, correlation = "diagonal") base_formula <- as.formula(brms_mmrm_formula[[1]]) attr(base_formula, "nl") <- NULL attr(base_formula, "loop") <- NULL
base_formula
colnames(model.matrix(object = base_formula, data = data))
To accomplish our goals, we need to carefully construct a linear transformation that maps these model coefficients to the marginal means of interest. The transformation should evaluate contrasts on the interesting parameters and average out the uninteresting parameters.
brms.mmrm::brm_model()
returns a fitted brms
model, and brms
already has tools for posterior inference. Through a combination of native functions and S3 methods, brms
integrates not only with posterior
and loo
, but also emmeans
for the estimation of marginal means and downstream contrasts.
Despite the existing features in brms
, brms.mmrm
implements custom code to transform model coefficients into marginal means. This is because the reference grids in emmeans
can only condition on factors explicitly declared in the model formula supplied to brms
, whereas brms.mmrm
needs more flexibility in order to support informative prior archetypes (@bedrick1996, @bedrick1997, @christensen2010, @rosner2021).
brms.mmrm
estimates marginal meansTo estimate marginal means, brms.mmrm::brm_transform_marginal()
creates a special matrix.
transform <- brm_transform_marginal(data = data, formula = brms_mmrm_formula)
dim(transform) transform[, 1:4]
This special matrix encodes the equations below which map model coefficients to marginal means.^[summary()
also invisibly returns a simple character vector with the equations below.]
summary(transform)
Multiplying the matrix by a set of model coefficients is the same as plugging the coefficients into the equations above. Both produce estimates of marginal means.
model <- lm(formula = base_formula, data = data) marginals_custom <- transform %*% coef(model) marginals_custom
This technique is similar to emmeans::emmeans(weights = "proportional")
^[Unlike emmeans
, brms.mmrm
completes the grid of patient visits to add rows for implicitly missing responses. Completing the grid of patient visits ensures all patients are represented equally when averaging over baseline covariates, regardless of patients who drop out early.] (@lenth2016, @searle1979) and produces similar estimates.
library(emmeans) marginals_emmeans <- emmeans( object = model, specs = ~ARMCD:AVISIT, weights = "proportional", nuisance = c("USUBJID", "RACE", "SEX") ) |> as.data.frame() |> as_tibble() |> select(ARMCD, AVISIT, emmean) |> arrange(ARMCD, AVISIT)
marginals_emmeans
marginals_custom - marginals_emmeans$emmean
For our Bayesian MMRMs in brms.mmrm
, the transformation from brm_transform_marginal()
operates on each individual draw from the joint posterior distribution. The transformation matrix produced by brm_transform_marginal()
is the value of the transform
argument of brm_marginal_draws()
. That way, brm_marginal_draws()
produces an entire estimated posterior of each marginal mean, rather than point estimates that assume a normal or Student-t distribution.^[You can then estimate the posterior of any function of marginal means by simply applying that function to the individual posterior draws from brm_marginal_draws()
. brm_marginal_grid()
helps identify column names for this kind of custom inference/post-processing.]
brm_marginal_draws()
worksLet us take a closer look at the equations that map model parameters to marginal means.
summary(transform)
These equations include terms not only for the fixed effects of interest, but also for nuisance variables FEV1_BL
, SEX
, RACE
, and WEIGHT
. These nuisance variables were originally part of the model formula, which means each marginal mean can only be interpreted relative to a fixed value of FEV1_BL
, a fixed proportion of female patients, etc. For example, if we dropped 40.13*b_FEV1_BL
, then PBO:VIS1
would be the placebo mean at visit 1 for patients with FEV1_BL = 0
: in other words, patients who cannot breathe out any air from their lungs at the beginning of the study. Similarly, if we dropped 0.53*b_SEXFemale
, then we would have to interpret PBO:VIS1
as the visit 1 placebo mean for male patients only. Fixed values 40.13
and 0.53
are averages over the data to ensure our marginal means apply to the entire patient population as a whole.
The major challenge of brm_transform_marginal()
is to condition on nuisance values that represent appropriate averages over the data. To calculate these nuisance values, brm_transform_marginal()
uses a technique similar to weights = "proportional"
in emmeans::emmeans()
.
To replicate brm_transform_marginal()
, we first create a reference grid to define the factor levels of interest and the means of continuous variables to condition on.
grid <- data |> mutate(FEV1_BL = mean(FEV1_BL), WEIGHT = mean(WEIGHT)) |> distinct(ARMCD, AVISIT, FEV1_BL, WEIGHT) |> arrange(ARMCD, AVISIT) grid
We use the grid to construct a model matrix with the desired interactions between continuous variables and factors of interest. Each column represents a model coefficient, and each row represents a marginal mean of interest.
transform <- model.matrix( object = ~ FEV1_BL * AVISIT + ARMCD * AVISIT + WEIGHT, data = grid ) rownames(transform) <- paste(grid$ARMCD, grid$AVISIT) transform
We want to predict at the "average" of SEX
and RACE
across all the data. Since SEX
and RACE
are factors, we cannot simply take the means of the variables themselves. Rather, we construct a model matrix to turn each factor level into a dummy variable, and then average those dummy variables across the entire dataset. This process accounts for the observed frequencies of these levels in the data (ideal for passive variables that the experiment does not directly control), while guarding against hidden confounding with the factors of interest (which can lead to Simpson's paradox).^[For more context, please refer to the emmeans
basics vignette, as well as discussion forums here and here.]
proportional_factors <- data |> model.matrix(object = ~ 0 + SEX + RACE) |> colMeans() |> t() proportional_factors
transform <- transform |> bind_cols(proportional_factors) |> as.matrix() transform <- transform[, names(coef(model))] rownames(transform) <- paste(grid$ARMCD, grid$AVISIT) transform
Finally, we use this transformation matrix to map estimated model coefficients to estimated marginal means.
marginals_custom <- transform %*% coef(model) marginals_custom
These results are extremely close to the estimated marginal means from emmeans
.
marginals_emmeans |> bind_cols(custom = as.numeric(marginals_custom)) |> mutate(difference = custom - emmean)
brms.mmrm
follows the procedure above, but in a Bayesian context. The brm_transform_marginal()
creates the matrix above, and brm_marginal_draws()
uses it to transform posterior draws of brms
model coefficients into posterior draws of marginal means. These posterior draws of marginal means then support estimation of treatment effects (via brm_marginal_draws()
and brm_marginal_summaries()
) and posterior probabilities on those treatment effects (via brm_marginal_probabilities()
). To fine-tune the marginal mean estimation procedure for niche use cases, you can modify the transformation returned from brm_transform_marginal()
and then supply it to the transform
argument of brm_marginal_draws()
.
Subgroup analysis raises important questions about how nuisance variables are averaged, and you as the user are responsible for choosing the approach that best suits the situation. To illustrate, suppose SEX
is a pre-specified subgroup. When estimating marginal means, we now wish to condition on "Female"
vs "Male"
while averaging over RACE
across the whole dataset. In emmeans
, this is similar to how we calculated marginals_emmeans
above, but we now move SEX
from nuisance
to specs
:
emmeans( object = model, specs = ~SEX:ARMCD:AVISIT, weights = "proportional", nuisance = c("USUBJID", "RACE") )
This may be reasonable in some cases, and it mitigates the kind of hidden confounding between the subgroup and other variables which may otherwise cause Simpson's paradox. However, for subgroup-specific marginal means, it may not be realistic to condition on a single point estimate for all levels of the reference grid. For example, if the model were to regress on a pregnancy
variable, then the marginal means for SEX = "Male"
should always condition on pregnancy = 0
instead of mean(data$pregnancy)
. And in general, it may be more reasonable to condition on subgroup-specific averages of nuisance variables. However, if you do this, it is your responsibility to investigate and understand the hidden interactions and confounding in your dataset. https://cran.r-project.org/package=emmeans/vignettes/interactions.html is an edifying vignette on this topic.
To opt into subgroup-specific averages of nuisance variables in brms.mmrm
, set average_within_subgroup = TRUE
in brm_transform_marginal()
, then supply the output to the transform
argument of brm_marginal_draws()
.
To replicate brm_transform_marginal(average_within_subgroup = TRUE)
from scratch, first create a reference grid which includes subgroup levels.
grid <- data |> distinct(ARMCD, SEX, AVISIT) |> arrange(ARMCD, SEX, AVISIT) grid
For each continuous variable, append the corresponding subgroup-specific averages to the grid.
means <- data |> group_by(SEX) |> summarize(FEV1_BL = mean(FEV1_BL), WEIGHT = mean(WEIGHT), .groups = "drop") grid <- left_join(x = grid, y = means, by = "SEX") grid
Begin creating the variable transformation matrix using this new grid. Be sure to include the subgroup in the formula below exactly as it appears in the formula used to fit the model.
transform <- model.matrix( object = ~ FEV1_BL * AVISIT + ARMCD * AVISIT + SEX + WEIGHT, data = grid )
Append subgroup-specific averages of the levels of nuisance factors (in this case, just RACE
).
proportions <- data |> model.matrix(object = ~ 0 + RACE) |> as.data.frame() |> mutate(SEX = data$SEX) |> group_by(SEX) |> summarize(across(everything(), mean), .groups = "drop") transform <- transform |> as.data.frame() |> mutate(SEX = grid$SEX) |> left_join(y = proportions, by = "SEX") |> select(-SEX) |> as.matrix()
Complete the transformation matrix by assigning the correct row names and aligning the column order with that of the model coefficients.
rownames(transform) <- paste(grid$ARMCD, grid$SEX, grid$AVISIT) transform <- transform[, names(coef(model))] transform
Finally, use the custom transform
matrix to estimate subgroup-specific marginal means. Because we averaged FEV1_BL
, WEIGHT
, and RACE
within subgroup levels, the results will differ from those of emmeans
.
transform %*% coef(model)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.