library("papaja")
# Seed for random number generation set.seed(42) knitr::opts_chunk$set(cache.extra = knitr::rand_seed)
In modeling the relation between a predictor and a dependent variable there are basically three modeling situations to consider. First, there are other variables that are correlated with the dependent variable, and possibly also with the predictor. Adding such variables, which are usually called covariates, to the regression model may change the regression coefficient of the predictor, and the standard error for this estimate. This is discussed in many textbooks about multivariate and regression analysis. Second, the mechanism explaining the effect between predictor and dependent can be examined by a mediation analysis. The mediator is a variable that partly "explains" the assumed causal process, which links the predictor to the dependent variable. Third, the effect of interest depends on another variable, which is called the moderator. A moderator influences the effect of the predictor on the dependent, implying that the effect is conditional on the value of the moderator.
When the three modeling situations are combined we obtain a general model, which is useful in many applied research. This general model is called the moderated mediation model and is earlier described in the methodological literature (e.g. @Edwards2007; @Hayes2015; @Preacher2007).
In this paper a short description of the moderated mediation model is given. Next the R package “gemm” is discussed, which is developed to analyze this model in an easy and straightforward way.
A moderated mediation model assumes that the effect of a predictor on a dependent variable is mediated by one or more mediators and that the indirect paths in this mediation are moderated. Moderation can take place on the effect of the predictor on the mediators, or on the effect of the mediators on the dependent variable. In this tutorial the function gemm()
is described for the analysis of moderated mediation models. The function is an alternative for the (SPSS or SAS) PROCESS macro developped by Hayes [@Hayes2018]. The gemm
function is in the R package gemm
. This package can be installed from Github and then loaded using the library()
function:
devtools::install_github("PeterVerboon/gemm")
library(gemm)
An illustration of the moderated mediation model, for which the function can be applied, is shown in Figure 1. The figure represents a structural model, linking the observed variables to each other. We are therefore using the SEM software to fit the moderated mediation model. To this end we use the lavaan package [@Rosseel2012]. Figure 1 shows the most general situation, with several mediators, several covariates for the mediators and several covariates for the dependent variable. There are also two moderators in this model, one for the paths from the predictor to the mediators and one for the paths from the mediators to the dependent variabele.
The model is limited in the sense that only one predictor, one dependent variable, and for each path one moderator are defined. Furthermore, serial mediation is not included in this model. The number of mediators and covariates is not limited.
The function gemm()
calls the functions buildModMedSemModel()
, which builds the requested SEM model, by using the specifications of the variables and the relations between them, which are provided by the user. In this model we assume that the covariates for the mediators (C) are correlated with each other, and furthermore that the covariates fir the dependent variable (D) correlate with each other and also with the mediators. If covariates (C) are included in the model, paths are assumed to exist from the covariates (C) to all mediators. For the function to run the model needs to have a predictor, a dependent variable and at least one mediator. All other variables are optional.
This most basic model (predictor --> mediator --> dependent) is just identified: three regression coefficients are estimated from the covariance matrix, containing three correlations. With two mediators there are six datapoints (the off-diagonal elements in the correlation matrix), and als six parameters to estimate (5 regression paths and one covariance between the mediators). With three mediators there are ten datapoints, and also ten parameters to estimate (seven regression paths and the three covariances between the mediators). It is easy to see that if we allow the mediators to correlate in the model, models with only mediators, a predictor and a dependent variable, are always just identified. The fit values for these models will therefore show perfect fit. Adding covariates or moderators will result in models in which the parameters have to be estimated iteratively, yielding model fit values that indicate how well the model fits the data.
After running the function gemm()
there are several functions available that are designed to handle the results: plotSS()
, plotIMM()
, plotIMM3d()
and print()
. These functions will be illustrated in this tutorial.
A primary statistic of interest in moderated mediation models is the index moderated mediation (Hays, 2015), which can be plotted by gemm
. For a good understanding of this index, its algebraic derivation is given in the appendix.
A quantification of the effect of X on Y as a function of the moderator has been described by Hayes (2015, 2018), who coined the term “index of moderated mediation”. The definition in this tutorial is slightly different from that given by Hayes (2105), since contrary to Hayes (2015), here it also contains the constant term $a_j b_j$. The IMM for a single path represents the slope (and intercept) of the line that relates the predictor with the dependent variable, as a function of the moderator. The index consists of a fixed part (the indirect effect) and a part that varies with the moderator. The plot of the IMM versus the moderator can be constructed using formula (3) from the appendix, which gives a straight line. This plot shows how the relation between predictor and dependent variable changes when the moderator changes. A flat horizontal line in this plot would indicate that there is no moderation: the relation between predictor and dependent variable does not change, when the moderater changes. A steep line om the other hand indicates a moderation effect. When the moderator takes the value zero (W = 0), the IMM represents the indirect (unmoderated) effect, $a_j b_j$.
The single-path IMM is defined for each mediator separately. Likewise, the IMM is defined for each moderated path separately, thus for the path from predictor to mediator (a-path), moderated by W, and also for mediator to the dependent variable (b-path), moderated by V. For reasons we refer to this IMM as a single-path IMM. So, for three mediators and two moderators (one for each path), six single-path IMMs can be computed.
Combining both moderators to obtain a double-path IMM would give a two-dimensional plane in a three-dimensional space (instead of a line), defined by the two moderators and the IMM. When one of the moderators is dichotomous, it is more convenient to plot the double-path IMM in a two dimensional space with two lines, one for each category of the moderator.
The gemm()
function knows two effect sizes for the indirect effect. First, the completely standardized effect size, and second, the effect size based on the ratio between indirect and total effect.
The completely standardized indirect effect for the $j^{th}$ mediator is defined as:
$$ a_j b_j(s_{x}/s_{y} ) $$
Here, $s_x$ represents the standard deviation of the predictor and $s_y$ the standard deviation of the dependent variable.
The effect size based on the ratio between the indirect effect and the total effect is defined as:
$$ a_j b_j/(c' + a_j b_j ) $$
These effect sizes assume a simple mediation model. This implies that they do not account for moderators and covariates.
The required parameters of the function gemm()
can be found by typing ?gemm
. This gives the help page of the function, showing which parameters the function needs. The fictious data that will be used to illustrate the model are included in the package and are in the R object cpbExample.rda
, which can be loaded by data("cpbExample")
. After loading the data, you can also learn more about the data by using the question mark.
data("cpbExample") # loads the data
The data are about the attitudes and self-reported contra-productive behaviour (CPB) of employees of an organisation that is in the middle of a reorganization. The model predicts that feelings of procedural injustice may lead to cynicism and less trust in the management. This relation may be stronger among employees who are insecure about their job continuation. Cynisicm may lead to CPB. However, strong personal norms may prevent CPB. Cynicism is expected to increase with age, and men may be more inclined towards CPB than women.
The first model that will be illustrated is a straightforward mediation model with two mediators, without moderators and without covariates. The parameter "xvar" represents the predictor, "yvar" the dependent variable and "mvars" a vector of mediators. The number of bootstrap samples is set to 500, a very small number for bootstraps, but here only used for illustration. For a real analysis you might use 5,000 boostraps; the default for nboot is 1,000. Except for nboot, these parameters are all obligatory to specify. This implies that neither a simple regression model, nor a (simple) moderation model, can be analyzed with this function. If you want to run a regression model (with moderators) it is easier to use the standard R function lm()
. Moderators can then easily be incorporated in the model by adding an interaction term, which is the product of the (centered) predictor and the (centered) moderator.
result <- gemm(dat = cpbExample, xvar = "procJustice", mvars = c("cynicism","trust"), yvar = "CPB", nboot = 100)
The result of the analysis is put in the object "result". After running the analysis, the results can be viewed by typing: print(result).
out <- print(result, silence = TRUE) papaja::apa_table(out$R2 ,placement = "htb", caption = " Estimates of explained variance") papaja::apa_table(out$totalEff, placement = "htb",caption = " Estimates of total effect", align = c("l", rep("r", 6))) papaja::apa_table(out$aPaths, placement = "htb", caption = " Estimates of a-paths", align = c("l", rep("r", 6))) papaja::apa_table(out$bPaths, placement = "htb", caption = " Estimates of b-paths", align = c("l", rep("r", 6))) papaja::apa_table(out$directEff, placement = "htb", caption = " Estimates of direct effect", align = c("l", rep("r", 6))) papaja::apa_table(out$indirectEff, placement = "htb", caption = " Estimates of indirect effects", align = c("l", rep("r", 6))) papaja::apa_table(out$csES, placement = "htb", caption = " Estimates of completely standardized effect sizes", align = c("l", rep("r", 6))) papaja::apa_table(out$ratES, placement = "htb", caption = " Estimates of effect sizes based on ratio indirect to total effect", align = c("l", rep("r", 6)))
The printed output consists of nine tables for a full model with covariates. First, the function input is shown. Here you can check whether you have specified the model that you intended to run. Then, the explained variance is shown in the first table for the dependent variable and the mediators. For the dependent variable the variance is explained by the predictor, the mediator(s), and whenever they are in the model also by the moderators and covariate(s). For the mediators the variance is explained by the predictor, and when present also by the moderator and covariate(s). The second table shows the total effect, which is simply the result of a regression of the predictor and the dependent variable. The third and fourth table show the estimates of the a- and b-paths respectively. The direct effect of the predictor on the dependent variable is table five. Table six shows the indirect effects for all mediators separately and the total indirect effect. Each row represents an indirect effect through a particular mediator. The last row is the total of all indirect effects. When there are covariates in the model, they are given in table seven. Table eigth and nine (or seven and eight when there no covariates) show two effect sizes of the indirect effects. Table eight, the completely standardized effect size, and table nine, the effect size based on the ratio between indirect and total effect. The confidence intervals around the estimates have been obtaind by using the bootstrap samples. The tables are given at the end of the document.
The results show that procedural justice is negatively associated with CPB (b = -0.73, p < .001). Less perceived procedural justice may "cause" more CPB. The estimates of the a-paths show that procedural justice is negatively associated with cynicism (b = -0.74, p < .001) and positively with trust (b = 1.10, p < .001). The estimates of the b-paths show that trust is negatively associated with CPB (b = -0.28, p < .001) and cynicism positively (b = 0.32, p < .001). The direct effect is also statistically significant (b = -0.18, p = .032). The indirect effect are both significant and negative, for cynicism (b = -0.23, p < .001) and trust (b = -.31, p <.001). The completely standardized effect sizes are respectively b = -0.18 (p <.001) for cynicism and b = -0.24 (p < .001) for trust. And finally the ratio base affect sizes are respectively b = 0.56 (p <.001) for cynicism and b = 0.63 (p < .001) for trust.
In the second example a moderated mediation model is shown with a covariate.
result <- gemm(dat = cpbExample, xvar = "procJustice", mvars = c("cynicism","trust"), yvar = "CPB", xmmod = "insecure", mymod = "gender" , cmvars = c("age"), nboot = 500)
We have added the moderator of the a-path ("gender") to the model. This variable is assumed to moderate the two effects of the two mediators on the dependent variable. This moderator is a dichotomous moderator, in R
terminology a factor with two levels (here, male and female). The assumption behind this moderator effect is that men are more inclined to show contra-productive behaviour than women. A second moderator is added: "insecure". This variable is expected to moderate the effect of procedural justice on the mediators. We assume that the effect is particularly present among persons who are insecure, becuase they use the information about justice to build their attitudes with respect to trust and cynicism. Fianlly, the variable age is used as covariate for predicting the mediators (cmvars), because we assume that the mediator cynicism may be influenced by age.
The structure of the printed output is similar to the previous analysis. Because of the moderators in the model, the results for the a- and b-paths now also contain interaction terms. Both interaction terms (procedural justice x insecure) are significant for cynicism (b = -.387, p < .001) and trust (b = -0.157, p < .001), respectively. For the b-path only the cynicism x gender interaction is significant (b -0.811, p < .001). This term actually refers to the mitigating effect of women om the cynicism to CPB association. This is better explained by simple slopes plots later on. Age is a significant covariate for cynicism (b = 0.206, p < .001) and also for trust (b = -0.045, p = 0.050).
We can ask for plots now, because the model contains at least one moderator. There actually are three plot types. The first type is the single-path index of moderated mediation (Hayes, 2018, p.425), plotted against the moderator. As explained before, the index of moderated mediation (IMM) is the slope of the line representing the effect of the mediator (c.q. predictor) on the dependent variable (c.q. mediator) that changes when the moderator changes. When there are two mediators in the model there are also two IMMs. A flat horizonal line indicates that the slope does not change as a function of the supposed moderator, in that case there is no moderation. The steeper the line the stronger the moderated mediation effect. For a dichotomous moderator only two points are shown (per mediator) with their 95% confidence interval. The command to obtain the single-path IMM is plotIMM(result)
.
plotIMM(result)
This command yields two plots: one for each moderator. First, we examine the moderator insecure. The IMM for the first mediator (cynicism) is negative and it is hardly influenced by the moderator insecure. So, mediation by cynicism is about the same for persons with low and high levels of insecurity. The IMM for the second mediator (trust) decreases as insecurity increases. This implies that the indirect effect through trust is larger (more negative) for relatively insecure persons compared to secure persons. The next figure shows the IMM for the moderator gender. For men the IMM with respect to cynicism is more negative than for women, and their confidence interval do not overlap. The indirect through cynicism for women is close to zero. For trust there is little difference between men and women. Women show larger variability, but for both men and women the confidence intervals contain zero. The second plot type is the mediated simple slopes plot. A simple slope plot shows the indirect effect of the predictor on the dependent variable through a mediator for usually two characteristic values of the moderator. Each value of the moderator is represented by a separate line. Parallel lines indicate absence of moderation and crossing lines the presence of moderation. The 95% CI interval around each line is indicated by a shaded area. For each mediator a separate subplot is given.
plotSS(result)
These plots show the effect of the predictor on the dependent variable through the mediators cynicism and trust, respectively. In the first figure the moderator insecure is a numerical variable. Because the moderator is a dichotomous variable in the second figure each plot contains two lines, one for each category of the moderator.
The first plot command provides two plots, for each moderator a separate plot of the single-path IMM, with in each plot a separate line for each mediator. In the first plot (for mod1) we see that for the second mediator the IMM becomes more negative as the value of the moderator increases. So, the effect of x1 on y1 is more negative for larger values of mod1.
The second plot command provides a simple slopes plots, one for each moderator, where each plot is split into subplots for each mediator. The lines represent the effect evaluated at the 16th and 84th percentile of the moderator, as recommended by Hayes(2018). Only for mod1 and mediator m2 there seems moderation, because we can see crossing lines. In the other three cases the lines run parallel to each other, indicating no relevant moderation effects.
plotSS(result)
The plotIMM3d(result)
command provides 3d-plots showing the double-path IMM for both moderators. Separate plots are given for each mediator. The message is the same as in the previous plots: only the following moderated mediation effect appears to be present: the indirect effect of x1 on y through the second mediator is moderated by mod1.
plotIMM3d(result)
The result
object is a list of three elements, which are named input, intermediate, and results, respectively. Input contains the original data and all variable names used in the analysis. Intermediate contains results that have been computed in the functions. For instance the object result$intermediate$model
contains the model specification that was used by lavaan, the object result$intermediate$result
contains the output of lavaan. You can inspect this output by loading the package lavaan and then using summary(result$intermediate$result)
. All other lavaan extract functions can be used on this object. The object result$intermediate$parameterEstimates
contains all estimated parameters from lavaan.
For example, when you want to see some important fit measures obtained in lavaan, you may type:
lavaan::fitmeasures(result$intermediate$result)[c("cfi","tli","rmsea","chisq", "df")]
In this way you are able to inspect all other lavaan results.
When there are no moderators or when moderators take the value zero, the indirect (unmoderated) effect is defined as: $a_j b_j$. The indirect moderated effect with the X – M path moderated by W (assuming there is no moderation through V or when V = 0) is derived as follows: \begin{equation} \begin{split} M_j = a_j X + w_j W + g_j WX\ Y = b_j M_j. \end{split} \end{equation}
Substitute $M_j$:
\begin{equation} \begin{split} Y = b_j(a_j X + w_j W + g_j WX)\ Y = b_j w_j W + (a_j b_j + b_j g_j W)X. \end{split} \end{equation}
The index of moderated mediation (IMM) is defined here as:
\begin{equation} IMM_a = a_j b_j + b_j g_j W. \end{equation}
This definition is slightly different from that given by Hayes (2105), since contrary to Hayes(2015), here it also contains the constant term $a_j b_j$. When W = 0 the IMM represents the indirect (unmoderated) effect. Combining two moderators to obtain a double-path IMM would give a two-dimensional plane in a three-dimensional space (instead of a line), defined by the two moderators and the IMM. This double-path IMM index is derived as follows. \begin{equation} \begin{split} M_j = a_j X + w_j W + g_j WX\ Y = b_j M_j + vV + h_j M_j V. \end{split} \end{equation}
Substitute M_j: \begin{equation} \begin{split} Y = b_j(a_j X + w_j W + g_j WX) + vV + h_j (a_j X + w_j W + g_j WX)V\ Y = b_j w_j W + (a_j b_j + b_j g_j W)X + vV + h_j w_j WV + (h_j a_j V + h_j g_j WV)X\ Y = b_j w_j W + vV + g_j w_j WV + (a_j b_j + b_j g_j W + h_j a_j V + h_j g_j WV)X. \end{split} \end{equation}
The double-path IMM, incorporating both moderators is: \begin{equation} IMM = a_j b_j + b_j g_j W + h_j a_j V + h_j g_j WV. \end{equation}
It is easy to see that formula (6) is a generalization of formula (3), when V = 0 (or absent), and when W = 0 (or absent), the index of moderated mediation (IMM) of the b-path is defined as: \begin{equation} IMM_b = a_j b_j + h_j a_j V. \end{equation}
Like before, this index indicates how the slope of the effect of X on Y changes as a function of the moderator, here the moderator of the b-path (V)
We used r cite_r("Refs_gemm.bib")
for all our analyses.
\newpage
r_refs(file = "Refs_gemm.bib")
\begingroup \setlength{\parindent}{-0.5in} \setlength{\leftskip}{0.5in}
\endgroup
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.