PROCESS: Model-based mediation and moderation analyses (named after...

View source: R/bruceR-stats_5_advance.R

PROCESSR Documentation

Model-based mediation and moderation analyses (named after but distinct from SPSS PROCESS).

Description

Model-based mediation and moderation analyses (i.e., using raw regression model objects with distinct R packages, BUT NOT with the SPSS PROCESS Macro, to estimate effects in mediation/moderation models).

NOTE: PROCESS() DOES NOT use or transform any code or macro from the original SPSS PROCESS macro developed by Hayes, though its output would link model settings to a PROCESS Model ID in Hayes's numbering system.

To use PROCESS() in publications, please cite not only bruceR but also the following R packages:

  • interactions::sim_slopes() is used to estimate simple slopes (and conditional direct effects) in moderation, moderated moderation, and moderated mediation models (for PROCESS Model IDs 1, 2, 3, 5, 7, 8, 9, 10, 11, 12, 14, 15, 16, 17, 18, 19, 58, 59, 72, 73, 75, 76).

  • mediation::mediate() is used to estimate (conditional) indirect effects in (moderated) mediation models (for PROCESS Model IDs 4, 5, 7, 8, 9, 10, 11, 12, 14, 15, 16, 17, 18, 19, 58, 59, 72, 73, 75, 76).

  • lavaan::sem() is used to perform serial multiple mediation analysis (for PROCESS Model ID 6).

Usage

PROCESS(
  data,
  y = "",
  x = "",
  meds = c(),
  mods = c(),
  covs = c(),
  clusters = c(),
  hlm.re.m = "",
  hlm.re.y = "",
  hlm.type = c("1-1-1", "2-1-1", "2-2-1"),
  med.type = c("parallel", "serial"),
  mod.type = c("2-way", "3-way"),
  mod.path = c("x-y", "x-m", "m-y", "all"),
  cov.path = c("y", "m", "both"),
  mod1.val = NULL,
  mod2.val = NULL,
  ci = c("boot", "bc.boot", "bca.boot", "mcmc"),
  nsim = 100,
  seed = NULL,
  center = TRUE,
  std = FALSE,
  digits = 3,
  file = NULL
)

Arguments

data

Data frame.

y, x

Variable name of outcome (Y) and predictor (X).

  • Can be: continuous (numeric) or dichotomous (factor)

meds

Variable name(s) of mediator(s) (M). Use c() to combine multiple mediators.

  • Can be: continuous (numeric) or dichotomous (factor)

  • Allows any number of mediators in parallel or 2~4 mediators in serial

  • Order matters when med.type="serial" (PROCESS Model 6: serial mediation)

mods

Variable name(s) of 0~2 moderator(s) (W). Use c() to combine multiple moderators.

  • Can be: continuous (numeric), dichotomous (factor), or multicategorical (factor)

  • Order matters when mod.type="3-way" (PROCESS Models 3, 5.3, 11, 12, 18, 19, 72, and 73)

  • Not applicable to med.type="serial" (PROCESS Model 6)

covs

Variable name(s) of covariate(s) (i.e., control variables). Use c() to combine multiple covariates.

  • Can be any type and any number of variables

clusters

HLM (multilevel) cluster(s): e.g., "School", c("Prov", "City"), c("Sub", "Item").

hlm.re.m, hlm.re.y

HLM (multilevel) random effect term of M model and Y model. By default, it converts clusters to lme4 syntax of random intercepts: e.g., "(1 | School)" or "(1 | Sub) + (1 | Item)".

You may specify these arguments to include more complex terms: e.g., random slopes "(X | School)", or 3-level random effects "(1 | Prov/City)".

hlm.type

HLM (multilevel) mediation type (levels of "X-M-Y"): "1-1-1" (default), "2-1-1" (indeed the same as "1-1-1" in a mixed model), or "2-2-1" (currently not fully supported, as limited by the mediation package). In most cases, no need to set this argument.

med.type

Type of mediator: "parallel" (default) or "serial" (only relevant to PROCESS Model 6). Partial matches with "p" or "s" also work. In most cases, no need to set this argument.

mod.type

Type of moderator: "2-way" (default) or "3-way" (relevant to PROCESS Models 3, 5.3, 11, 12, 18, 19, 72, and 73). Partial matches with "2" or "3" also work.

mod.path

Which path(s) do the moderator(s) influence? "x-y", "x-m", "m-y", or any combination of them (use c() to combine), or "all" (i.e., all of them). No default value.

cov.path

Which path(s) do the control variable(s) influence? "y", "m", or "both" (default).

mod1.val, mod2.val

By default (NULL), it uses Mean +/- SD of a continuous moderator (numeric) or all levels of a dichotomous/multicategorical moderator (factor) to perform simple slope analyses and/or conditional mediation analyses. You may manually specify a vector of certain values: e.g., mod1.val=c(1, 3, 5) or mod1.val=c("A", "B", "C").

ci

Method for estimating the standard error (SE) and 95% confidence interval (CI) of indirect effect(s). Defaults to "boot" for (generalized) linear models or "mcmc" for (generalized) linear mixed models (i.e., multilevel models).

  • "boot": Percentile Bootstrap

  • "bc.boot": Bias-Corrected Percentile Bootstrap

  • "bca.boot": Bias-Corrected and Accelerated (BCa) Percentile Bootstrap

  • "mcmc": Markov Chain Monte Carlo (Quasi-Bayesian)

Note that these methods never apply to the estimates of simple slopes. You should not report the 95% CIs of simple slopes as Bootstrap or Monte Carlo CIs, because they are just standard CIs without any resampling method.

nsim

Number of simulation samples (bootstrap resampling or Monte Carlo simulation) for estimating SE and 95% CI. Defaults to 100 for running examples faster. In formal analyses, however, nsim=1000 (or larger) is strongly suggested!

seed

Random seed for reproducible results. Defaults to NULL. Note that all mediation analyses include random processes (i.e., bootstrap resampling or Monte Carlo simulation). To reproduce results, you need to set a random seed. However, even if you set the same seed number, it is unlikely to get exactly the same results across different R packages (e.g., lavaan vs. mediation) and software (e.g., SPSS, Mplus, R, jamovi).

center

Centering numeric (continuous) predictors? Defaults to TRUE (suggested).

std

Standardizing variables to get standardized coefficients? Defaults to FALSE. If TRUE, it will standardize all numeric (continuous) variables before building regression models. However, it is not suggested to set std=TRUE for generalized linear (mixed) models.

digits

Number of decimal places of output. Defaults to 3.

file

File name of MS Word (".doc"). Currently, only regression model summary can be saved.

Value

Invisibly return a list of results:

process.id

PROCESS Model ID (in Hayes's numbering system).

process.type

PROCESS model type.

model.m

Mediator (M) model(s) (a list of multiple models).

model.y

Outcome (Y) model.

results

Effect estimates and other results (unnamed list object).

Output

Two parts of results are printed:

  • PART 1. Regression model summary

  • PART 2. Mediation/moderation effect estimates

Disclaimer

PROCESS() DOES NOT use or transform any code or macro from the original SPSS PROCESS macro developed by Hayes, though its output would link model settings to a PROCESS Model ID in Hayes's numbering system.

DO NOT state that "the bruceR package runs the PROCESS Model Code developed by Hayes (2018)" — it was not the truth. The bruceR package only links results to Hayes's numbering system but never uses his code.

Software Comparison

To perform mediation, moderation, and conditional process (moderated mediation) analyses, people may use Mplus, SPSS "PROCESS" macro, or SPSS "MLmed" macro. Some R packages and functions can also perform such analyses, in a somewhat complex way, including mediation::mediate(), interactions::sim_slopes(), and lavaan::sem().

Furthermore, some other R packages or scripts/modules have been developed, including jamovi module jAMM (by Marcello Gallucci, based on the lavaan package), R package processR (by Keon-Woong Moon, not official, also based on the lavaan package), and R script file "process.R" (the official PROCESS R code by Andrew F. Hayes, but it is not yet an R package).

Distinct from these existing tools, PROCESS() provides an integrative way for performing mediation/moderation analyses in R. This function supports 24 kinds of SPSS PROCESS models numbered by Hayes (2018) (but does not use or transform his code), and also supports multilevel mediation/moderation analyses. Overall, it supports the most frequently used types of mediation, moderation, moderated moderation (3-way interaction), and moderated mediation (conditional indirect effect) analyses for (generalized) linear or linear mixed models.

Specifically, PROCESS() fits regression models based on the data, variable names, and a few other arguments that users input (with no need to specify the PROCESS Model ID or manually mean-center the variables). The function can automatically link model settings to Hayes's numbering system.

Variable Centering

PROCESS() automatically conducts grand-mean centering, using grand_mean_center(), before model building, though it can be turned off by setting center=FALSE.

The grand-mean centering is important because it:

  1. makes the results of main effects accurate for interpretation (see my commentary on this issue: Bao et al., 2022);

  2. does not change any model fit indices (it only affects the interpretation of main effects);

  3. is only conducted in "PART 1" (for an accurate estimate of main effects) but not in "PART 2" because it is more intuitive and interpretable to use the raw values of variables for the simple-slope tests in "PART 2";

  4. is not conflicted with group-mean centering because after group-mean centering the grand mean of a variable will also be 0, such that the automatic grand-mean centering (with mean = 0) will not change any values of the variable.

Conduct group-mean centering, if necessary, with group_mean_center() before using PROCESS(). Remember that the automatic grand-mean centering never affects the values of a group-mean centered variable, which already has a grand mean of 0.

References

Hayes, A. F. (2018). Introduction to mediation, moderation, and conditional process analysis (second edition): A regression-based approach. Guilford Press.

Yzerbyt, V., Muller, D., Batailler, C., & Judd, C. M. (2018). New recommendations for testing indirect effects in mediational models: The need to report and test component paths. Journal of Personality and Social Psychology, 115(6), 929–943.

See Also

lavaan_summary()

model_summary()

med_summary()

For more details and illustrations, see PROCESS-bruceR-SPSS (PDF and Markdown files).

Examples

#### NOTE ####
## In the following examples, I set nsim=100 to save time.
## In formal analyses, nsim=1000 (or larger) is suggested!

#### Demo Data ####
# ?mediation::student
data = mediation::student %>%
  dplyr::select(SCH_ID, free, smorale, pared, income,
                gender, work, attachment, fight, late, score)
names(data)[2:3] = c("SCH_free", "SCH_morale")
names(data)[4:7] = c("parent_edu", "family_inc", "gender", "partjob")
data$gender01 = 1 - data$gender  # 0 = female, 1 = male
# dichotomous X: as.factor()
data$gender = factor(data$gender01, levels=0:1, labels=c("Female", "Male"))
# dichotomous Y: as.factor()
data$pass = as.factor(ifelse(data$score>=50, 1, 0))

#### Descriptive Statistics and Correlation Analyses ####
Freq(data$gender)
Freq(data$pass)
Describe(data)     # file="xxx.doc"
Corr(data[,4:11])  # file="xxx.doc"

#### PROCESS Analyses ####

## Model 1 ##
PROCESS(data, y="score", x="late", mods="gender")  # continuous Y
PROCESS(data, y="pass", x="late", mods="gender")   # dichotomous Y

# (multilevel moderation)
PROCESS(data, y="score", x="late", mods="gender",  # continuous Y (LMM)
        clusters="SCH_ID")
PROCESS(data, y="pass", x="late", mods="gender",   # dichotomous Y (GLMM)
        clusters="SCH_ID")

# (Johnson-Neyman (J-N) interval and plot)
PROCESS(data, y="score", x="gender", mods="late") -> P
P$results[[1]]$jn[[1]]       # Johnson-Neyman interval
P$results[[1]]$jn[[1]]$plot  # Johnson-Neyman plot (ggplot object)
GLM_summary(P$model.y)       # detailed results of regression

# (allows multicategorical moderator)
d = airquality
d$Month = as.factor(d$Month)  # moderator: factor with levels "5"~"9"
PROCESS(d, y="Temp", x="Solar.R", mods="Month")

## Model 2 ##
PROCESS(data, y="score", x="late",
        mods=c("gender", "family_inc"),
        mod.type="2-way")  # or omit "mod.type", default is "2-way"

## Model 3 ##
PROCESS(data, y="score", x="late",
        mods=c("gender", "family_inc"),
        mod.type="3-way")
PROCESS(data, y="pass", x="gender",
        mods=c("late", "family_inc"),
        mod1.val=c(1, 3, 5),     # moderator 1: late
        mod2.val=seq(1, 15, 2),  # moderator 2: family_inc
        mod.type="3-way")

## Model 4 ##
PROCESS(data, y="score", x="parent_edu",
        meds="family_inc", covs="gender",
        ci="boot", nsim=100, seed=1)

# (allows an infinite number of multiple mediators in parallel)
PROCESS(data, y="score", x="parent_edu",
        meds=c("family_inc", "late"),
        covs=c("gender", "partjob"),
        ci="boot", nsim=100, seed=1)

# (multilevel mediation)
PROCESS(data, y="score", x="SCH_free",
        meds="late", clusters="SCH_ID",
        ci="mcmc", nsim=100, seed=1)

## Model 6 ##
PROCESS(data, y="score", x="parent_edu",
        meds=c("family_inc", "late"),
        covs=c("gender", "partjob"),
        med.type="serial",
        ci="boot", nsim=100, seed=1)

## Model 8 ##
PROCESS(data, y="score", x="fight",
        meds="late",
        mods="gender",
        mod.path=c("x-m", "x-y"),
        ci="boot", nsim=100, seed=1)

## For more examples and details, see:
## https://github.com/psychbruce/bruceR/tree/main/note


bruceR documentation built on Aug. 21, 2025, 5:38 p.m.