The ExactMed functions
In ExactMed: Exact Mediation Analysis for Binary Outcomes

knitr::opts_chunk$set(
  collapse = TRUE,
  prompt = TRUE,
  comment = " "
)

Introduction

This document aims to illustrate the usage of the functions exactmed(), exactmed_c() and exactmed_cat(), as well as their behavior via additional examples. All functions compute natural direct and indirect effects, and controlled direct effects for a binary outcome. However each function handles a specific type of mediator: exactmed() accommodates a binary mediator, exactmed_c() a continuous mediator and exactmed_cat() a categorical mediator. Details on the use of the function exactmed() are provided next. Usage of exactmed_c() and exactmed_cat() is similar to that of exactmed(), but differs on some aspects described thereafter.

In exactmed(), the user can specify the high levels of the outcome and mediator variables using the input parameters hvalue_m and hvalue_y, respectively (see the function help). Controlled direct effects are obtained for both possible mediator values ($m=0$ and $m=1$). Natural and controlled effects can either be unadjusted (crude) or adjusted for covariates (that is, conditional effects). By default, adjusted effects estimates are obtained for covariates fixed at their sample-specific mean values (for numerical covariates and categorical covariates through associated dummies). Alternatively, adjusted effects estimates can be obtained for specific values of the covariates that are user-provided. Also, by default, exactmed() incorporates a mediator-exposure interaction term in the outcome model, which can be removed by setting interaction=FALSE. Concerning interval estimates, exactmed() generates, by default, $95\%$ confidence intervals obtained by the delta method. Alternatively, percentile bootstrap confidence intervals, instead of delta method confidence intervals, can be obtained by specifying boot=TRUE in the function call. In this case, 1000 bootstrap data sets are generated by default.

In exactmed_c() and exactmed_cat(), only the high level of the outcome variable can be specified (using the input parameter hvalue_y). Moreover, for each scale, the controlled direct effect is computed at a mediator value or level specified by the user using the parameter mf. By default, this parameter is fixed at the sample-specific mean of the mediator in exactmed_c(), whereas it is fixed at the reference level of the mediator in exactmed_cat(). In order to use exactmed_cat(), the mediator must be coded as a factor variable in the data set. By default, the reference level of the mediator is the first level of the corresponding factor variable. The extra input parameter blevel_m of the exactmed_cat() function allows the user to change the default reference level to any other level. It is worth noting that parameter blevel_m only potentially impacts the value of the controlled direct effect (not the natural direct and indirect effects).

Due to the similarity between exactmed(), exactmed_c() and exactmed_cat() in terms of use and options offered to the user, most examples will be presented with the exactmed() function. In all the exactmed() examples presented below we use the data set datamed, available after loading the ExactMed package. Some of the features of this data set can be found in its corresponding help file (help(datamed)). A user interested in the exactmed_c() or exactmed_cat() functions for the continuous or categorical mediator cases, respectively, will only need to change the name of the function (and data set) in the calling of these examples to understand their use. The data sets datamed_c and datamed_cat, which feature a continuous and a categorical mediator, respectively, are presented at the end of the document along with a few calling examples.

Lastly, we recall that all ExactMed functions only work on data frames with named columns and no missing values.

library(ExactMed)

head(datamed)

The following command verifies whether the data set contains any missing values:

as.logical(sum(is.na(datamed)))

Basic examples

Suppose that one wishes to obtain unadjusted (crude) mediation effects estimates for a change in exposure from $0$ to $1$, assuming there is no exposure-mediator interaction and using the delta method to construct $95\%$ confidence intervals.

In this case, a valid call to exactmed() would be:

results1 <- exactmed(
  data = datamed, a = 'X', m = 'M', y = 'Y', 
  a1 = 1, a0 = 0, interaction = FALSE
  )  

results1

Mediation effects estimates adjusted for covariates are obtained through the use of the character vectors m_cov and y_cov, which contain the names of the covariates to be adjusted for in the mediator and outcome models, respectively. The following call to exactmed() incorporates covariates C1 and C2 in both the mediator and outcome models:

results2 <- exactmed(
  data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0,  
  m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'), 
  interaction = FALSE
  )

results2

The exactmed() function also allows for the specification of two different sets of covariates in the mediator and outcome models. For example, the following specification of m_cov and y_cov means that the mediator model is adjusted for C1 and C2, while the outcome model is adjusted for C1 only.

However, we advise against this practice unless it is known that excluded covariates are independent of the dependent variable (mediator or outcome) being modeled given the rest of covariates.

results3 <- exactmed(
  data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0,  
  m_cov = c('C1', 'C2'), y_cov = c('C1'), 
  interaction = FALSE
  )

results3

By default, the adjusted parameter is TRUE. If the adjusted parameter is set to FALSE, exactmed() ignores the values of the vectors m_cov and y_cov and computes unadjusted (crude) effects estimates as in the first example above:

results4 <- exactmed(
  data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0, 
  m_cov = c('C1', 'C2'), y_cov = c('C1'), 
  adjusted = FALSE, interaction = FALSE
  )

results4

To perform an adjusted mediation analysis allowing for exposure-mediator interaction (by default, the interaction parameter is TRUE) and using bootstrap based on $100$ resamples with initial random seed $= 1991$ to construct $97\%$ confidence intervals, one should call exactmed() as follows:

results5 <- exactmed(
  data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0, 
  m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'), 
  boot = TRUE, nboot = 100, bootseed = 1991, confcoef = 0.97
  )

results5

Firth's penalization

In the situation where we believe that we are facing a problem of separation or quasi-separation, Firth's penalization can be used by setting the Firth parameter to TRUE (Firth penalized mediation analysis). If this is the case, Firth's penalization is applied to both the mediator model and the outcome model.

The Firth parameter implements Firth's penalization to reduce the bias of the regression coefficients estimators under scarce or sparse data (see details in exactmed() help page):

results6 <- exactmed(
  data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0, 
  m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'), Firth = TRUE, 
  boot = TRUE, nboot = 100, bootseed = 1991, confcoef = 0.97
  )

results6

Stratum-specific effects

The following call to exactmed() returns mediation effects adjusted for the covariates C1 and C2, when the values of the covariates C1 and C2 are $0.1$ and $0.4$, respectively, assuming an exposure-mediator interaction and using the delta method to construct $95\%$ confidence intervals:

results7 <- exactmed(
  data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0, 
  m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'), 
  m_cov_cond = c(C1 = 0.1, C2 = 0.4), y_cov_cond = c(C1 = 0.1, C2 = 0.4)
  )

results7

Common adjustment covariates in vectors m_cov and y_cov must have the same values; otherwise, the execution of the exactmed() function is aborted and an error message is displayed in the R console. Example:

exactmed(
  data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0, 
  m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'), 
  m_cov_cond = c(C1 = 0.3, C2 = 0.4), y_cov_cond = c(C1 = 0.1, C2 = 0.4)
 )

If the covariates specified in m_cov_cond (y_cov_cond) constitute some proper subset of m_cov (y_cov) then the other covariates are set to their sample-specific mean levels. Hence, the call

results8 <- exactmed(
  data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0, 
  m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'), 
  m_cov_cond = c(C1 = 0.1), y_cov_cond = c(C1 = 0.1)
  )

is equivalent to:

 mc2 <- mean(datamed$C2)
 mc2

results9 <- exactmed(
  data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0, 
  m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'), 
  m_cov_cond = c(C1 = 0.1, C2 = mc2), y_cov_cond = c(C1 = 0.1, C2 = mc2)
  )

This can be checked by comparing the two outputs:

all.equal(results8, results9)

With this in mind, an error is easily predicted if one makes this call:

exactmed(
  data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0, 
  m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'), 
  m_cov_cond = c(C1 = 0.1), y_cov_cond = c(C1 = 0.1, C2 = 0.4)
  )

Categorical covariates

The exactmed() function also allows for categorical covariates. Covariates of this type must appear in the data frame as factor, character, or logical columns. To illustrate how exactmed() works with categorical covariates, we replace the covariate C1 in the data set datamed by a random factor column:

cate <- factor(sample(c("a", "b", "c"), nrow(datamed), replace =TRUE))
datamed$C1 <- cate

It is possible to estimate mediation effects at specific values of categorical covariates using the input parameters m_cov_cond and y_cov_cond. Note that if the targeted covariates are a mixture of numerical and categorical covariates, the above parameters require to be list-type vectors, instead of atomic vectors as when covariates are only numerical or only categorical.

Hence, if one wants to estimate mediation effects at level 'a' for C1 and at value $0.4$ for C2, assuming an exposure-mediator interaction and using the delta method to construct $95\%$ confidence intervals, exactmed() should be called as follows:

results10 <- exactmed(
  data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0, 
  m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'), 
  m_cov_cond = list(C1 = 'a', C2 = 0.4), y_cov_cond = list(C1 = 'a', C2 = 0.4)
  )

results10

If one does not specify a value for the categorical covariate C1, exactmed() computes the effects by assigning each dummy variable, created internally by exactmed() for each non-reference level of C1, to a value equal to the proportion of observations in the corresponding category (equivalent to setting each dummy variable to its mean value):

results11 <- exactmed(
  data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0, 
  m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'), 
  m_cov_cond = c(C2 = 0.4), y_cov_cond = c(C2 = 0.4)
  )

results11

Case-control data

exactmed() can also compute mediation effects with a binary outcome and a binary mediator when the data come from a classical case-control study wherein the probability of being selected only depends on the outcome status. To do so, the true outcome prevalence (that is, the population prevalence $P(Y = hvalue_y))$ must be known and the yprevalence parameter set to this value. exactmed() accounts for the ascertainment in the sample by employing weighted regression techniques that use inverse-probability weighting (IPW) with robust standard errors (see details in the documentation).

The following call to exactmed() returns mediation effects supposing that the data have been obtained from a case-control study and that the true outcome prevalence is $0.1$:

results12 <- exactmed(
  data = datamed, a = 'X', m = 'M', y = 'Y', 
  a1 = 1, a0 = 0, interaction = FALSE, yprevalence = 0.1
  )

results12

Of note, the same optional parameters described in the previous sections are available in the case-control study context.

Mediation analysis with a continuous mediator

As mentioned in the introduction, in the case of a continuous mediator, the ExactMed package allows the user to obtain estimates of the different mediation effects using the exactmed_c() function, which essentially offers the same options as exactmed(). The only difference is the absence of the hvalue_m parameter and the addition of the mf parameter, the latter allowing to set the value of the mediator in the calculation of the controlled direct effect (by default fixed at the sample-specific mean of the mediator).

For illustration, the package also makes available to the user the datamed_c data set containing a continuous mediator variable. Some of the features of this data set can be found in its corresponding help file (help(datamed_c)). We recall that the exactmed_c() function only works on data frames with named columns and no missing values.

library(ExactMed)

head(datamed_c)

We provide below an example of call to exactmed_c() that allows to obtain estimates of conditional mediation effects supposing no exposure-mediator interaction in the outcome regression model:

results13 <- exactmed_c(
  data = datamed_c, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0,  
  m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'), 
  interaction = FALSE
  )

results13

To perform an adjusted mediation analysis allowing for exposure-mediator interaction, using bootstrap based on $100$ resamples with initial random seed $= 1885$ to construct $95\%$ confidence intervals and computing the controlled direct effect when the mediator is set at the value $2$, one should call exactmed_c() as follows:

results14 <- exactmed_c(
  data = datamed_c, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0, 
  m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'), 
  boot = TRUE, nboot = 100, bootseed = 1885, confcoef = 0.95,
  mf = 2
  )

results14

Mediation analysis with a categorical mediator

As mentioned in the introduction, in the case of a categorical mediator (coded as factor), the ExactMed package allows the user to obtain estimates of mediation effects through the exactmed_cat() function, which basically offers the same options as exactmed(). The only difference is the absence of the hvalue_m parameter and the addition of two extra parameters: blevel_m and mf. The first one allows to set the reference level of the mediator, which by default corresponds to the first level of the corresponding factor variable. The second parameter allows to specify the level of the mediator in the calculation of the controlled direct effect. Parameter blevel_m will thus impact the mediator regression model and associated output by fixing the reference level of the dependent variable. Parameter blevel_m will not impact the values of the natural effects and will impact the controlled direct effect only if the value of the parameter mf is not specified by the user. In this case, the value of the parameter mf will by default correspond to the value of parameter blevel_m.

For illustration, the package also makes available to the user the datamed_cat data set containing a categorical mediator variable. Some of the features of this data set can be found in its corresponding help file (help(datamed_cat)). We recall that the exactmed_cat() function only works on data frames with named columns and no missing values.

head(datamed_cat)

We provide below an example of call to exactmed_cat() to obtain estimates of conditional mediation effects supposing no exposure-mediator interaction in the outcome regression model:

results15 <- exactmed_cat(
  data = datamed_cat, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0,  
  m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'), 
  interaction = FALSE
  )

results15

To perform an adjusted mediation analysis allowing for exposure-mediator interaction, using bootstrap based on $100$ resamples with initial random seed $= 1875$ to construct $95\%$ confidence intervals and computing the controlled direct effect at the level 'c' of the mediator, one should call exactmed_cat() as follows:

results16 <- exactmed_cat(
  data = datamed_cat, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0, 
  m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'), 
  boot = TRUE, nboot = 100, bootseed = 1875, confcoef = 0.95,
  mf = 'c'
  )

results16

One can note from the previous output that the reference level for the mediator model is by default the first level of the mediator factor variable (blevel_m = 'a'). However, the controlled direct effect is computed at the level 'c' of the categorical mediator, as requested by the parameter mf (that is, mf = 'c').