The qgcompint package: g-computation with statistical interaction
In qgcompint: Quantile G-Computation Extensions for Effect Measure Modification

knitr::opts_chunk$set(echo = TRUE)

Introduction
Basics: fitting a model with a modifier
Basics: getting bounds for pointwise comparisons
Basics: plotting weights
Basics: bootstrapping
- Plotting bootstrapped fits (plotting predictions)
- Overlay plots of predictions at multiple modifier levels
Basics: Estimating equations
- Plotting predictions from estimating equation fits (plotting predictions)
- Overlay plots of predictions at multiple modifier levels
Basics: categorical modifier, binary outcome
Basics: Non-numerical factor modifiers
Basics: Continuous modifiers
- Pointwise comparisons at specific values of a continuous confounder)
- Pointwise comparisons in bootstrapped fits (work in progress)
Non-linear fits
Survival analysis
Frequently asked questions
Acknowledgments

Introduction

Quantile g-computation (qgcomp) is a special case of g-computation used for estimating joint exposure response curves for a set of continuous exposures. The base package qgcomp allows one to estimate conditional or marginal joint-exposure response curves. Because this approach developed within the field of "exposure mixtures" the set of exposures of interest are referred to here as "the mixture." qgcompint builds on qgcomp by incorporating statistical interaction (product terms) between binary, categorical, or continuous covariates and the mixture.

The model

Say we have an outcome $Y$, some exposures $\mathbb{X}$, a "modifier" or a covariate for which we wish to assess statistical interaction with $\mathbb{X}$, denoted by $M$ and possibly some other covariates (e.g. potential confounders) denoted by $\mathbb{Z}$.

The basic model of quantile g-computation is a joint marginal structural model given by

$$ \mathbb{E}(Y^{\mathbf{X}_q} || M, \mathbf{Z, \psi, \eta}) = g(\psi_0 + \psi_1 S_q + \psi_2 M + \psi_3 M\times S_q + \mathbf{\eta Z}) $$

where $g(\cdot)$ is a link function in a generalized linear model (e.g. the inverse logit function in the case of a logistic model for the probability that $Y=1$), $\psi_0$ is the model intercept, $\mathbf{\eta}$ is a set of model coefficients for the covariates and $S_q$ is an "index" that represents a joint value of exposures. The joint exposure has a "main effect" at the referent value of $M$ given by $\psi_1$, $\psi_2$ represents the association (or set of associations for categorical $M$) between the modifier and the outcome, and $\psi_3$ is a product terms (or set of product terms for categorical $M$) that represent the deviation of the exposure response for $S_q$ from the main effect for each one unit increase in $M$. The magnitude of $\psi_3$ can be used to estimate the extent of statistical interaction on the model scale, sometimes referred to as effect measure modification.

Quantile g-computation (by default) transforms all exposures $\mathbf{X}$ into $\mathbf{X}_q$, which are "scores" taking on discrete values 0, 1, 2, etc. representing a categorical "bin" of exposure. By default, there are four bins with evenly spaced quantile cutpoints for each exposure, so ${X}_q=0$ means that $X$ was below the observed 25th percentile for that exposure. The index $S_q$ represents all exposures being set to the same value (again, by default, discrete values 0, 1, 2, 3). Thus, *the parameter $\psi_1$ quantifies the expected change in the outcome, given a one quantile increase in all exposures simultaneously, * possibly adjusted for $\mathbf{Z}$.

There are nuances to this particular model form that are available in the qgcompint package which will be explored below. There exists one special case of quantile g-computation that leads to fast fitting: linear/additive exposure effects. Here we simulate "pre-quantized" data where the exposures $X_1, X_2, ..., X_7$ can only take on values of 0, 1, 2, 3 in equal proportions. The model underlying the outcomes is given by the linear regression:

$$ \mathbb{E}(Y || \mathbf{X}, M, \beta, \psi, \eta) = \beta_0 + \beta_1 X_1 + ... + \beta_7 X_7 + \psi_2 M +\eta_9 X_1\times M, ..., +\eta_{15} X_7\times M $$

with the true values of $\beta$ given by:

bn = c(
  paste0("psi_", 0),
  paste0("beta_", 1:7),
  paste0("psi_", 2),
  paste0("eta_", 1:7))
bv = c(0,
  c(0.8, 0.6, 0.3, -0.3, -0.3, -0.3, 0),
  0,
  c(1.0, 0.0, 0.0, 0.0, 0.2, 0.2, 0.2))

dt = data.frame(value=bv, row.names = bn)
print(dt)

In this example $X_1$ is positively correlated with $X_2-X_4$ ($\rho=0.8, 0.6, 0.3$) and negatively correlated with $X_5-X_7$ ($\rho=-0.3-0.3, -0.3$). In this setting, the parameter $\psi_1$ will equal the sum of the $\beta_1-\beta_7$ coefficients (0.8), $\psi_2$ is given directly by the data generation model (0.0), and $\psi_3$ will equal the sum of the $\eta$ coefficients (1.6). Simulating data to fit this model is available within a the simdata_quantized_emm function in the qgcompint package. Here, we simulate data using a binary modifier and inspect the correlation matrix to see that the estimated correlation matrix is approximately the same as the correlation of the data generation mechanism. These will converge in large sample sizes, but for a sample size of 200, the estimated coefficients will differ from those we simulate under due to random variation (set the sample size to 10000 in this example to confirm).

 library(qgcompint)
 set.seed(42)
 dat1 <- simdata_quantized_emm(
  outcometype="continuous",
# sample size
  n = 300,
# correlation between x1 and x2, x3, ...
  corr=c(0.8, 0.6, 0.3, -0.3, -0.3, -0.3),
# model intercept
  b0=0,
# linear model coefficients for x1, x2, ... at referent level of interacting variable
  mainterms=c(0.3, -0.1, 0.1, 0.0, 0.3, 0.1, 0.1),
# linear model coefficients for product terms between x1, x2, ... and interacting variable  
  prodterms = c(1.0, 0.0, 0.0, 0.0, 0.2, 0.2, 0.2),
# type of interacting variable
  ztype = "binary",
# number of levels of exposure
  q = 4,
# residual variance of y
  yscale = 2.0                            
)
names(dat1)[which(names(dat1)=="z")] = "M"

## data
head(dat1)
## modifier
table(dat1$M)
## outcome
summary(dat1$y)
## exposure correlation
cor(dat1[, paste0("x", 1:7)])

Basics: fitting a model with a modifier

Here we see that qgcomp (via the function qgcomp.emm.glm.noboot) estimates a $\psi_1$ fairly close to 0.8 (estimate = 0.7) (again, as we increase sample size, the estimated value will be expected to become increasingly close to the true value). The product term 'M:mixture' is the $\psi_3$ parameter noted above, which is also fairly close to the true value of 1.6 (estimate = 1.7).

For binary modifiers, qgcomp.emm.glm.noboot will also estimate the joint effect of the mixture in strata of the modifier. Here, the effect of the mixture at $M=0$ is given by $\psi_1$, whereas the effect of the mixture at $M=1$ is estimated below the coefficient table (and is here given by $\psi_1+\psi_3$ = 2.4).

qfit1 <- qgcomp.emm.glm.noboot(y~x1+x2+x3+x4+x5+x6+x7,
  data = dat1,
  expnms = paste0("x", 1:7),
  emmvar = "M",
  q = 4)
qfit1

Basics: getting bounds for pointwise comparisons

As in qgcomp you can estimate pointwise comparisons along the joint regression line. Here we estimate them at both values of $M$ (via the emmval parameter).

pointwisebound(qfit1, emmval=0)
pointwisebound(qfit1, emmval=1)

Basics: plotting weights (weights are at referent level of modifier)

For qgcomp.emm.glm.noboot fits, a set of "weights" will be given that are interpreted as the proportion of a "partial" effect for each variable. That is, $\psi_1$ will represent the joint effect of multiple exposures, some of which will have independent effects that are positive, and some will have negative independent effects. For example, the "negative partial effect" is simply the sum of all of the negative independent effects (this is only given for a model in which all exposures are included via linear terms and no interactions among exposures occur). These weights are conditional on the fitted model, and so are not "estimates" per se and will not have associated confidence intervals. Nonetheless, the weights are useful for interpretation of the joint effect.

Notably, with product terms in the model for the joint effect, a different set of weights will be generated at every value of the modifier. Here, we can plot the weights at M=0 and M=1.

plot(qfit1, emmval=0)
plot(qfit1, emmval=1)

Basics: bootstrapping

For non-linear qgcomp fits, or to get marginal estimates of the exposure-response curve (i.e. conditional only on the modifier), we can use the qgcomp.emm.boot function. Here we just repeat the original fit, which yields similar evidence that there is substantial statistical interaction on the additive scale, so that we would expect the joint exposure effect estimate to be greater for M=1 than for M=0, which corroborates the fit above (the point estimates are identical, as expected in this case due to no non-modifier covariates, so this is not a surprise).

qfit1b <- qgcomp.emm.boot(y~x1+x2+x3+x4+x5+x6+x7,
  data=dat1,
  expnms = paste0("x", 1:7),
  emmvar = "M",
  q = 4)
qfit1b

Plotting bootstrapped fits (plotting predictions)

plot(qfit1b, emmval=0)
plot(qfit1b, emmval=1)

Overlay plots of predictions at multiple modifier levels

Visually, it's easier to compare two plots with the same y-axis. Here we can see that the joint regression curve is steeper at M=1 than it is at M=0. You will get some notes from ggplot2 about replacing scales, which can be ignored or suppressed using suppressMessages(print( <ggplot statements > )). Visually, you can see substantial evidence for interaction due to the diverging slopes and non-overlapping confidence intervals.

library(ggplot2) # need to explicitly call ggplot
pp0b = plot(qfit1b, emmval=0, suppressprint=TRUE, geom_only=TRUE, 
             pointwisebars=TRUE, modelfitline=FALSE, modelband=FALSE)
pp1b = plot(qfit1b, emmval=1, suppressprint=TRUE, geom_only=TRUE, 
             pointwisebars=TRUE, modelfitline=FALSE, modelband=FALSE)
# relabel the plots by directly modifying the ggplot2 geometries.
# "col" is a column that sets the color
# "point" and "errorbar" are names given to ggplot2::geom_point and 
# ggplot2::geom_error bar geometries
pp0b$point$data$col = "M=0"
pp1b$point$data$col = "M=1"
pp0b$errorbar$data$col = "M=0"
pp1b$errorbar$data$col = "M=1"
# jitter along the x axis so the points do not overlap
pp0b$point$data$x = pp0b$point$data$x - 0.01 
pp1b$point$data$x = pp1b$point$data$x + 0.01 
pp0b$errorbar$data$x = pp0b$errorbar$data$x - 0.01 
pp1b$errorbar$data$x = pp1b$errorbar$data$x + 0.01 



suppressMessages(print(
  ggplot() + pp0b + pp1b + scale_colour_viridis_d(name="") + 
    labs(title="Bootstrap approach")
))

Basics: Estimating equations

Another approach to non-linear qgcomp fits, or to get marginal estimates of the exposure-response curve (i.e. conditional only on the modifier), we can use the qgcomp.emm.glm.ee function. This approach uses estimating equation methodology for fitting. Estimating equation methodology also underlies GEE (generalized estimating equation) methods, which are frequently employed in longitudinal data analysis because they can yield appropriate standard errors for clustered data. In qgcomp.emm.glm.ee, the standard errors are also cluster robust (as long as the id variable is specified), but it can also be used to obtain non-linear fits in the same way that the bootstrapped estimator is above. Relative to the bootstrapped estimator, the estimating equations estimator will be less computationally intensive and will yield standard errors that are not subject to simulation error (and thus will be more repeatable). In general, the estimating equation method should be preferable for many of the cases in which one might be considering the bootstrapped estimator. Because standard errors are estimated differently, you will see differences in confidence intervals between qgcomp.emm.glm.ee and qgcomp.emm.glm.noboot even when the models are otherwise identical.

Here we just repeat the analysis with the bootstrapped fit, which yields similar evidence that there is substantial statistical interaction on the additive scale, so that we would expect the joint exposure effect estimate to be greater for M=1 than for M=0, which corroborates the fit above (the point estimates are identical, as expected in this case due to no non-modifier covariates, so this is not a surprise).

qfit1ee <- qgcomp.emm.glm.ee(y~x1+x2+x3+x4+x5+x6+x7,
  data=dat1,
  expnms = paste0("x", 1:7),
  emmvar = "M",
  q = 4)
qfit1ee

Plotting predictions from estimating equation fits (plotting predictions)

Here we skip to overlaying the two plots

pp0ee = plot(qfit1b, emmval=0, suppressprint=TRUE, geom_only=TRUE, 
             pointwisebars=TRUE, modelfitline=FALSE, modelband=FALSE)
pp1ee = plot(qfit1b, emmval=1, suppressprint=TRUE, geom_only=TRUE, 
             pointwisebars=TRUE, modelfitline=FALSE, modelband=FALSE)
# relabel the plots by directly modifying the ggplot2 geometries.
# "col" is a column that sets the color
# "point" and "errorbar" are names given to ggplot2::geom_point and 
# ggplot2::geom_error bar geometries
pp0ee$point$data$col = "M=0"
pp1ee$point$data$col = "M=1"
pp0ee$errorbar$data$col = "M=0"
pp1ee$errorbar$data$col = "M=1"
# jitter along the x axis so the points do not overlap
pp0ee$point$data$x = pp0ee$point$data$x - 0.01 
pp1ee$point$data$x = pp1ee$point$data$x + 0.01 
pp0ee$errorbar$data$x = pp0ee$errorbar$data$x - 0.01 
pp1ee$errorbar$data$x = pp1ee$errorbar$data$x + 0.01 



suppressMessages(print(
  ggplot() + pp0ee + pp1ee + scale_colour_viridis_d(name="") + 
    labs(title="Estimating equations approach")
))

Basics: categorical modifier, binary outcome

Now we can simulate data under a categorical modifier, and the simdata_quantized_emm function will pick some convenient defaults. We will also use a binary outcome with N=300 to allow that use of a categorical modifier with a rare binary outcome will be subject to low power and potential for bootstrapping to fail due to empty strata in some bootstrap samples.

Simulated data defaults

Here is a good place to note that simdata_quantized_emm is provided mainly as a learning tool. While it could potentially be used for simulations in a scientific publication, it likely has too many defaults that are not user controllable that may be useful to be able to change. Some of the underlying code that may serve as inspiration for more comprehensive simulations can be explored via the "hidden" (non-exported) functions qgcompint:::.quantized_design_emm, qgcompint:::.dgm_quantized_linear_emm, qgcompint:::.dgm_quantized_logistic_emm, and qgcompint:::.dgm_quantized_survival_emm.

 set.seed(23)
 dat2 <- simdata_quantized_emm(
  outcometype="logistic",
# sample size
  n = 300,
# correlation between x1 and x2, x3, ...
  corr=c(0.6, 0.5, 0.3, -0.3, -0.3, 0.0),
# model intercept
  b0=-2,
# linear model coefficients for x1, x2, ... at referent level of interacting variable
  mainterms=c(0.1, -0.1, 0.1, 0.0, 0.1, 0.1, 0.1),
# linear model coefficients for product terms between x1, x2, ... and interacting variable  
  prodterms = c(0.2, 0.0, 0.0, 0.0, 0.2, -0.2, 0.2),
# type of interacting variable
  ztype = "categorical",
# number of levels of exposure
  q = 4,
# residual variance of y
  yscale = 2.0                            
)

## data
head(dat2)
## modifier
table(dat2$z)
## outcome
table(dat2$y)
## exposure correlation
cor(dat2[, paste0("x", 1:7)])

The wrong way to include categorical modifiers

Below is one way to fit qgcomp.emm.glm.noboot with a categorical modifier that exactly follows the previous code. This approach to categorical modifiers is incorrect, in this case, due to the format of these data. Note if you fit the model like this, where your categorical modifier is not the proper data type, qgcomp.emm.glm.noboot will assume you have a continuous modifier.

qfit.wrong <- qgcomp.emm.glm.noboot(y~x1+x2+x3+x4+x5+x6+x7,
  data = dat2,
  expnms = paste0("x", 1:7),
  emmvar = "z",
  q = 4, family=binomial())
qfit.wrong

The right way to include categorical modifiers (use `as.factor()`)

Instead, you should convert each categorical modifier to a "factor" prior to fitting the model. Here you can see the output for both the non-bootstrapped fit and the fit with bootstrapped confidence intervals (since there are no other covariates in the model, these two approaches estimate the same marginal effect and parameter log-odds ratio estimates will be identical). Here we see that we get a unique interaction term and main effect for each level of the modifier z.

dat2$zfactor = as.factor(dat2$z)
# using asymptotic-based confidence intervals
qfit2 <- qgcomp.emm.glm.noboot(y~x1+x2+x3+x4+x5+x6+x7,
  data = dat2,
  expnms = paste0("x", 1:7),
  emmvar = "zfactor",
  q = 4, family=binomial())
# using bootstrap based confidence intervals (estimate a)
set.seed(12312)
qfit2b <- qgcomp.emm.boot(y~x1+x2+x3+x4+x5+x6+x7,
  data = dat2,
  expnms = paste0("x", 1:7),
  emmvar = "zfactor",
  B = 10, # set higher when using in real data
  q = 4, family=binomial(), rr = FALSE)
# using estimating equation based confidence intervals (estimate ee)
set.seed(12312)
qfit2ee <- qgcomp.emm.ee(y~x1+x2+x3+x4+x5+x6+x7,
  data = dat2,
  expnms = paste0("x", 1:7),
  emmvar = "zfactor",
  q = 4, family=binomial(), rr = FALSE)
qfit2
qfit2b
qfit2ee

Bounds for pointwise comparisons with categorical modifiers

Here are some miscellaneous functions for getting point estimates and bounds for various comparisons at specific values of the modifier.

## output the weights at Z=0
getstratweights(qfit2, emmval=0)
## output pointwise comparisons at Z=0
pointwisebound(qfit2, emmval=0)
## plot weights at Z=0
plot(qfit2, emmval=0)

## output stratum specific joint effect estimate for the mixture at Z=2
print(getstrateffects(qfit2, emmval=2))
## output the weights at Z=2
print(getstratweights(qfit2, emmval=2))
## output pointwise comparisons at Z=2
pointwisebound(qfit2, emmval=2)
plot(qfit2, emmval=2)

## output stratum specific joint effect estimate for the mixture at Z=2 from bootstrapped fit
print(getstrateffects(qfit2b, emmval=2))
## output pointwise comparisons at Z=2 from bootstrapped fit
print(pointwisebound(qfit2b, emmval=2))
## output modelwise confidence bounds at Z=2 from bootstrapped fit
print(modelbound(qfit2b, emmval=2))

## Plot pointwise comparisons at Z=2 from bootstrapped fit
plot(qfit2b, emmval=2)

Overlay plots of predictions at multiple modifier levels

Visually, it's easier to compare predictions across all levels of the EMM variable on the same plot. This plot looks at bounds on the marginal structural regression model line.

library(ggplot2) # need to explicitly call ggplot
pp0ee = plot(qfit2ee, emmval=0, suppressprint=TRUE, geom_only=TRUE, 
             modelband=TRUE, pointwisebars=FALSE, modelfitline=TRUE)
pp1ee = plot(qfit2ee, emmval=1, suppressprint=TRUE, geom_only=TRUE, 
             modelband=TRUE, pointwisebars=FALSE, modelfitline=TRUE)
pp2ee = plot(qfit2ee, emmval=2, suppressprint=TRUE, geom_only=TRUE, 
             modelband=TRUE, pointwisebars=FALSE, modelfitline=TRUE)
# relabel the plots uisn
pp0ee$line$data$col = "Z=0"
pp1ee$line$data$col = "Z=1"
pp2ee$line$data$col = "Z=2"
pp0ee$ribbon$data$col = "Z=0"
pp1ee$ribbon$data$col = "Z=1"
pp2ee$ribbon$data$col = "Z=2"
# jitter along the x axis so the points do not overlap
pp0ee$line$data$x = pp0ee$line$data$x - 0.01 
pp2ee$line$data$x = pp2ee$line$data$x + 0.01 
pp0ee$ribbon$data$x = pp0ee$ribbon$data$x - 0.01 
pp2ee$ribbon$data$x = pp2ee$ribbon$data$x + 0.01 

ggplot() + pp0ee + pp1ee + pp2ee + scale_color_viridis_d(name="") + scale_fill_viridis_d(name="", alpha=0.25)

Global tests of interaction for categorical modifiers

It is possible to obtain a global test of (no) interaction for a multi-level modifier using the qgcomp package and the anova function. You need to fit a model without interaction (via qgcomp) and then with interaction. The test is done on the "underlying" conditional model corresponding to the original data, rather than the marginal structural model. The methods for qgcomp.glm.[no]boot and qgcomp.emm.glm.[no]boot utilize the anova.glm function from the stats package (note the slightly different specification), while the methods for qgcomp.glm.ee and qgcomp.emm.glm.ee use anova.geeglm function methods from the geepack R package. Both tests return similar (not identical) results.

library("qgcomp")
dat2$zfactor = as.factor(dat2$z)
catfitoverall <- qgcomp.glm.noboot(y~x1+x2+x3+x4+x5+x6+x7,
  data = dat2,
  expnms = paste0("x", 1:7),
  q = 4, family=binomial())
catfitemm <- qgcomp.emm.glm.noboot(y~x1+x2+x3+x4+x5+x6+x7,
  data = dat2,
  expnms = paste0("x", 1:7),
  emmvar = "zfactor",
  q = 4, family=binomial())

catfitoverallee <- qgcomp.glm.ee(y~x1+x2+x3+x4+x5+x6+x7,
  data = dat2,
  expnms = paste0("x", 1:7),
  q = 4, family=binomial())
catfitemmee <- qgcomp.emm.glm.ee(y~x1+x2+x3+x4+x5+x6+x7,
  data = dat2,
  expnms = paste0("x", 1:7),
  emmvar = "zfactor",
  q = 4, family=binomial(), rr=FALSE)


anova(catfitoverall$fit, catfitemm$fit, test = "Chisq")
anova(catfitemmee, catfitoverallee)

Basics: Non-numerical factor modifiers

As above, non-numerical modifiers must be coded as factors. These are straightforward, and this example follows with no commentary.

 set.seed(23)
 dat3 <- simdata_quantized_emm(
  outcometype="continuous",
  n = 300,
  corr=c(0.6, 0.5, 0.3, -0.3, -0.3, 0.0),
  b0=-2,
  mainterms=c(0.1, -0.1, 0.1, 0.0, 0.1, 0.1, 0.1),
  prodterms = c(0.2, 0.0, 0.0, 0.0, 0.2, -0.2, 0.2),
  ztype = "categorical",
  q = 4,
  yscale = 2.0                            
)

dat3$zletter = as.factor(with(dat3, ifelse(z==0, "a", ifelse(z==1, "b", "c"))))
with(dat3, table(z, zletter))

qfit_letter <- qgcomp.emm.glm.noboot(y~x1+x2+x3+x4+x5+x6+x7,
  data = dat3,
  expnms = paste0("x", 1:7),
  emmvar = "zletter",
  q = 4, family=gaussian())

qfit_letteree <- qgcomp.emm.glm.ee(y~x1+x2+x3+x4+x5+x6+x7,
  data = dat3,
  expnms = paste0("x", 1:7),
  emmvar = "zletter",
  q = 4, family=gaussian())

print(qfit_letter)
print(qfit_letteree)
## output stratum specific joint effect estimate for the mixture at Zletter="a"
print(getstrateffects(qfit_letter, emmval="a"))
print(getstrateffects(qfit_letteree, emmval="a"))
## output the weights at Z=2
print(getstratweights(qfit_letter, emmval="a"))

## output predictions at Zletter="b"
print(pointwisebound(qfit_letter, emmval="b", pointwiseref = 1, alpha=0.05))
print(pointwisebound(qfit_letteree, emmval="b", pointwiseref = 1, alpha=0.05))

Basics: Continuous modifiers

Here we simulate some data, similar to prior datasets, where we use a continuous modifier.

 set.seed(23)
 dat3 <- simdata_quantized_emm(
  outcometype="continuous",
# sample size
  n = 100,
# correlation between x1 and x2, x3, ...
  corr=c(0.8, 0.6, 0.3, -0.3, -0.3, -0.3),
# model intercept
  b0=-2,
# linear model coefficients for x1, x2, ... at referent level of interacting variable
  mainterms=c(0.3, -0.1, 0.1, 0.0, 0.3, 0.1, 0.1),
# linear model coefficients for product terms between x1, x2, ... and interacting variable  
  prodterms = c(1.0, 0.0, 0.0, 0.0, 0.2, 0.2, 0.2),
# type of interacting variable
  ztype = "continuous",
# number of levels of exposure
  q = 4,
# residual variance of y
  yscale = 2.0                            
)
names(dat3)[which(names(dat3)=="z")] = "CoM"

head(dat3)
summary(dat3$CoM)
summary(dat3$y)
cor(dat3[, paste0("x", 1:7)])

Fits are done as in the previous example (here they are shown without any further commentary)

qfit3 <- qgcomp.emm.glm.noboot(y~x1+x2+x3+x4+x5+x6+x7,
  data = dat3,
  expnms = paste0("x", 1:7),
  emmvar = "CoM",
  q = 4)
qfit3
qfit3b <- qgcomp.emm.boot(y~x1+x2+x3+x4+x5+x6+x7,
  data = dat3,
  expnms = paste0("x", 1:7),
  emmvar = "CoM",
  B=10,
  q = 4)
qfit3b
qfit3ee <- qgcomp.emm.ee(y~x1+x2+x3+x4+x5+x6+x7,
  data = dat3,
  expnms = paste0("x", 1:7),
  emmvar = "CoM",
  q = 4)
qfit3ee

Stratified effects, weights, and pointwise effect comparisons at specific values of a continuous confounder

Weights can be obtained for any valid value of the modifier and are available for non-bootstrapped fits.
Stratified effects can be obtained for any valid value of the modifier and are available for any fit
Point-wise comparisons are also available for any fit. These estimate the predicted outcome as well as confidence intervals for a contrast with a specified level of the joint exposure (here the second lowest quartile group)

## output/plot the weights at CoM=0")
getstratweights(qfit3, emmval=0)
plot(qfit3, emmval=0)

## output stratum specific joint effect estimate for the mixture at CoM=0
print(getstrateffects(qfit3, emmval=0))

## output pointwise comparisons at CoM=0
print(pointwisebound(qfit3, emmval=0))


## output/plot the weights at the 80%ile of CoM
getstratweights(qfit3, emmval=quantile(dat3$CoM, .8))
plot(qfit3, emmval=quantile(dat3$CoM, .8))


## output stratum specific joint effect estimate for the mixture at the 80%ile of CoM
print(getstrateffects(qfit3, emmval=quantile(dat3$CoM, .8)))

## output pointwise comparisons at at the 80%ile of CoM
print(pointwisebound(qfit3, emmval=quantile(dat3$CoM, .8), pointwiseref = 2))

Stratified effects and pointwise comparisons can also be made with bootstrapped/estimating equation fits.

## output stratum specific joint effect estimate for the mixture at CoM=0
#print(getstrateffects(qfit3b, emmval=0)) # commmented out to reduce output
print(getstrateffects(qfit3ee, emmval=0))

## output pointwise comparisons at CoM=0
#print(pointwisebound(qfit3b, emmval=0)) # commmented out to reduce output
print(pointwisebound(qfit3ee, emmval=0))

## output stratum specific joint effect estimate for the mixture at the 80%ile of CoM
#print(getstrateffects(qfit3b, emmval=quantile(dat3$CoM, .8))) # commmented out to reduce output
print(getstrateffects(qfit3ee, emmval=quantile(dat3$CoM, .8)))

## output pointwise comparisons at at the 80%ile of CoM
#print(pointwisebound(qfit3b, emmval=quantile(dat3$CoM, .8))) # commmented out to reduce output
print(pointwisebound(qfit3ee, emmval=quantile(dat3$CoM, .8)))

## plot the pointwise effects at CoM=0 (compare bootstrap with estimating equations approach)
## note that the bootstrapped version is not valid because of only using 10 bootstrap iterations
ppb <- plot(qfit3b, emmval=0, suppressprint = TRUE, geom_only = TRUE)
ppee <- plot(qfit3ee, emmval=0, suppressprint = TRUE, geom_only = TRUE)

ppb$point$data$col = "Bootstrap"
ppb$errorbar$data$col = "Bootstrap"
ppee$point$data$col = "EE"
ppee$errorbar$data$col = "EE"

ppee$point$data$x = ppee$point$data$x + 0.01
ppee$errorbar$data$x = ppee$errorbar$data$x + 0.01


ggplot() + ppb + ppee + scale_color_grey(name="Pointwise estimates, 95% CI")

Non-linear fits

Non-linear joint effects of a mixture will tend to occur when independent effects of individual exposures are non-linear or when there is interaction on the model scale between exposures. Here is a toy example of allowing a quadratic overall joint effect with an underlying set of interaction terms between x1 and all other exposures. q is set to 8 for this example for reasons described in the qgcomp package vignette. This approach adds an interaction term for both the "main effect" of the mixture and the squared term of the mixture. The coefficients can be intererpreted as any other quadratic/interaction terms (which is to say, with some difficulty). Informally, the CoM:mixture^2 term can be interpreted as the magnitude of the change in the non-linearity of the slope due to a one unit increase in the continuous modifier. Both the bootstrapped and estimating equations formulations can be used to fit non-linear models, with the estimating equations approaches being preferred for computational efficiency.

qfit3bnl <- qgcomp.emm.glm.boot(y~x1+x2+x3+x4+x5+x6+x7 + x1*(x2 + x3 + x4 + x5 + x6 +x7),
  data = dat3,
  expnms = paste0("x", 1:7),
  emmvar = "CoM",
  B = 10, # set higher in practice
  q = 8, degree= 2)

qfit3bnlee <- qgcomp.emm.glm.ee(y~x1+x2+x3+x4+x5+x6+x7 + x1*(x2 + x3 + x4 + x5 + x6 +x7),
  data = dat3,
  expnms = paste0("x", 1:7),
  emmvar = "CoM",
  q = 8, degree= 2)

qfit3bnl
qfit3bnlee

Some effect estimation tools are not all enabled for non-linear fits and will produce an error. However, as with previous fits, we can estimate pointwise differences along the quantiles, and plot the marginal structural model regression line at various values of the joint quantized exposures. Here you can see that the regression line is expected to be steeper at higher levels of the modifier, as in previous fits. We can explictly ask for model confidence bands (which are given by the modelbound function), which pull in confidence limits for the regression line based on the bootstrap distribution of estimates.

print(pointwisebound(qfit3bnlee, emmval=-1))
print(pointwisebound(qfit3bnlee, emmval=-1))
print(pointwisebound(qfit3bnlee, emmval=1))
plot(qfit3bnlee, emmval=-1, modelband=TRUE, pointwiseref=4)
plot(qfit3bnlee, emmval=1, modelband=TRUE, pointwiseref=4)

We can look at this fit another way, too, by plotting predictions from the marginal structural model at all observed values of the modifier, which gives a more complete picture of the model than the plots at single values of the modifier. We can create smoothed scatter plot lines (LOESS) at binned values of the modifier to informally look at how the regression line might change over values of the modifier. We can approximate the effect (point estimate only) at a specific value of z by plotting a smooth fit limited a narrow range of the modifier (< -1 or > 1). Here we see that the joint effect of exposures does appear to differ across values of the modifier, but there is little suggestion of non-linearity at either low or high values of the modifier. This informal assessment agrees with intuition based on the estimated coefficients and the standard plots from the qgcompint package.

library(ggplot2)
plotdata = data=data.frame(q=qfit3bnlee$index, ey=qfit3bnlee$y.expected, modifier=qfit3bnlee$emmvar.msm)
ggplot() + 
         geom_point(aes(x=q, y=ey, color=modifier), data=plotdata) + 
         geom_point(aes(x=q, y=ey), color="purple", data=plotdata[plotdata$modifier>1, ], pch=1, cex=3) + 
         geom_smooth(aes(x=q, y=ey), se=FALSE, color="purple", data=plotdata[plotdata$modifier>1, ], method = 'loess', formula='y ~ x') + 
         geom_smooth(aes(x=q, y=ey), se=FALSE, color="red", data=plotdata[plotdata$modifier < -1, ], method = 'loess', formula='y ~ x') + 
         geom_point(aes(x=q, y=ey), color="red", data=plotdata[plotdata$modifier < -1, ], pch=1, cex=3) + 
  theme_classic() + 
  labs(y="Expected outcome", x="Quantile score value (0 to q-1)") + 
  scale_color_continuous(name="Value\nof\nmodifier")

Survival analysis

As with standard qgcomp, the qgcompint package allows assessment of effect measure modification for a Cox proportional hazards model. The simdata_quantized_emm function allows simulation of right censored survival data.

 set.seed(23)
 dat4 <- simdata_quantized_emm(
  outcometype="survival",
# sample size
  n = 200,
# correlation between x1 and x2, x3, ...
  corr=c(0.8, 0.6, 0.3, -0.3, -0.3, -0.3),
# model intercept
  b0=-2,
# linear model coefficients for x1, x2, ... at referent level of interacting variable
  mainterms=c(0.0, -0.1, 0.1, 0.0, 0.3, 0.1, 0.1),
# linear model coefficients for product terms between x1, x2, ... and interacting variable  
  prodterms = c(0.1, 0.0, 0.0, 0.0, -0.2, -0.2, -0.2),
# type of interacting variable
  ztype = "categorical",
# number of levels of exposure
  q = 4,
# residual variance of y
  yscale = 2.0                            
)
dat4$zfactor = as.factor(dat4$z)
head(dat4)
summary(dat4$zfactor)
summary(dat4$time)
table(dat4$d) # 30 censored

cor(dat4[, paste0("x", 1:7)])

Fitting a Cox model with a fully linear/additive specification is very similar to other qgcomp "noboot" models, and the same plots/weight estimation/effect estimation functions work on these objects. For now, bootstrapped versions of this model are not available.

qfit4 <- qgcomp.emm.cox.noboot(survival::Surv(time, d)~x1+x2+x3+x4+x5+x6+x7,
  data = dat4,
  expnms = paste0("x", 1:7),
  emmvar = "zfactor",
  q = 4)

qfit4
plot(qfit4, emmval=0)
getstratweights(qfit4, emmval=2)
getstrateffects(qfit4, emmval=2)
pointwisebound(qfit4, emmval=1)

Frequently asked questions

How does qgcomp/qgcompint address collinearity

In general, t does so by asking a question that is typically not adversely impacted by high correlation/near/full non-collinearity. When possible, some methods allow the option "Bayes=TRUE," which does perform some very light regularization and can overcome perfect collinearity in a way that allows estimation of the overall effect of exposure, but the independent effects (scaled effects, or weights) will still be subject to issues like variance inflation.

Is there an upper limit in terms of the number of variables I can include?

It depends. Qgcomp/qgcompint is not generally suited for problems when the number of predictors exceeds the sample size. If you try the method, it works, and you obtain reasonable narrow confidence intervals, then the method has been able to handle that number of exposures. Package authors would advocate that if you are working with a problem where the number of predictors exceeds the sample size and are wondering about this package, that you seek out a statistician with experience in those problems that can point out the challenges in such analyses.

Why are estimating equation methods and bootstrapping both used? Is one preferable?

Anything in this package that is available via bootstrapping is also available for the (newer) estimating equation approaches in this package. If your goal is just to estimate parameters, then the estimating equation approaches may be preferable because they are less computationally burdensome and prone to fewer errors like simulation error that may not be in every researcher's area of expertise for diagnosis.

Why isn't available in this package even though it is in the `qgcomp` package?

Developer time. If you see a method in the qgcomp package that you would like to see in this package and could make a good use-case for. Please contact the package developer/maintainer.

How do I use multiple modifiers?

There is currently no simple way to implement multiple, simultaneous modifiers in the qgcompint package. For binary/categorical modifiers, it is straightforward to create a single modifier with distinct value for every unique combination of the modifiers.

References

Original quantile g-computation paper

Alexander P. Keil, Jessie P. Buckley, Katie M. O'Brien, Kelly K. Ferguson, Shanshan Zhao,Alexandra J. White. A quantile-based g-computation approach to addressing the effects of exposure mixtures. https://doi.org/10.1289/EHP5838

First paper to use qgcompint

Stevens, D.R., Bommarito, P.A., Keil, A.P., McElrath, T.F., Trasande, L., Barrett, E.S., Bush, N.R., Nguyen, R.H., Sathyanarayana, S., Swan, S. and Ferguson, K.K., 2022. Urinary phthalate metabolite mixtures in pregnancy and fetal growth: findings from the infant development and the environment study. Environment international, 163, p.107235.

Acknowledgments

The development of this package was supported by NIH Grant RO1ES02953101 and the NIH intramural research program. Invaluable code testing has been performed by Danielle Stephens and Juwel Rana, among others.

Any scripts or data that you put into this service are public.

qgcompint documentation built on Aug. 9, 2025, 1:08 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

qgcompint
Quantile G-Computation Extensions for Effect Measure Modification

The qgcompint package: g-computation with statistical interaction
In qgcompint: Quantile G-Computation Extensions for Effect Measure Modification

Table of Contents

Introduction

The model

Basics: fitting a model with a modifier

Basics: getting bounds for pointwise comparisons

Basics: plotting weights (weights are at referent level of modifier)

Basics: bootstrapping

Plotting bootstrapped fits (plotting predictions)

Overlay plots of predictions at multiple modifier levels

Basics: Estimating equations

Plotting predictions from estimating equation fits (plotting predictions)

Basics: categorical modifier, binary outcome

Simulated data defaults

The wrong way to include categorical modifiers

The right way to include categorical modifiers (use `as.factor()`)

Bounds for pointwise comparisons with categorical modifiers

Overlay plots of predictions at multiple modifier levels

Global tests of interaction for categorical modifiers

Basics: Non-numerical factor modifiers

Basics: Continuous modifiers

Stratified effects, weights, and pointwise effect comparisons at specific values of a continuous confounder

Non-linear fits

Survival analysis

Frequently asked questions

How does qgcomp/qgcompint address collinearity

Is there an upper limit in terms of the number of variables I can include?

Why are estimating equation methods and bootstrapping both used? Is one preferable?

Why isn't available in this package even though it is in the `qgcomp` package?

How do I use multiple modifiers?

See also

References

Original quantile g-computation paper

First paper to use qgcompint

Acknowledgments

Try the qgcompint package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

qgcompint Quantile G-Computation Extensions for Effect Measure Modification

The qgcompint package: g-computation with statistical interaction In qgcompint: Quantile G-Computation Extensions for Effect Measure Modification

Table of Contents

Introduction

The model

Basics: fitting a model with a modifier

Basics: getting bounds for pointwise comparisons

Basics: plotting weights (weights are at referent level of modifier)

Basics: bootstrapping

Plotting bootstrapped fits (plotting predictions)

Overlay plots of predictions at multiple modifier levels

Basics: Estimating equations

Plotting predictions from estimating equation fits (plotting predictions)

Basics: categorical modifier, binary outcome

Simulated data defaults

The wrong way to include categorical modifiers

The right way to include categorical modifiers (use as.factor())

Bounds for pointwise comparisons with categorical modifiers

Overlay plots of predictions at multiple modifier levels

Global tests of interaction for categorical modifiers

Basics: Non-numerical factor modifiers

Basics: Continuous modifiers

Stratified effects, weights, and pointwise effect comparisons at specific values of a continuous confounder

Non-linear fits

Survival analysis

Frequently asked questions

How does qgcomp/qgcompint address collinearity

Is there an upper limit in terms of the number of variables I can include?

Why are estimating equation methods and bootstrapping both used? Is one preferable?

Why isn't available in this package even though it is in the qgcomp package?

How do I use multiple modifiers?

See also

References

Original quantile g-computation paper

First paper to use qgcompint

Acknowledgments

Try the qgcompint package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

qgcompint
Quantile G-Computation Extensions for Effect Measure Modification

The qgcompint package: g-computation with statistical interaction
In qgcompint: Quantile G-Computation Extensions for Effect Measure Modification

The right way to include categorical modifiers (use `as.factor()`)

Why isn't available in this package even though it is in the `qgcomp` package?