Performs a multiverse analysis for multinomial processing tree (MPT) models
across maximum-likelihood/frequentist and Bayesian estimation approaches. For
the frequentist approaches, no pooling (with and without parametric or
nonparametric bootstrap) and complete pooling are implemented using
MPTinR. For the Bayesian approaches, no pooling, complete pooling, and
three different variants of partial pooling are implemented using
data on a by-participant level with each row
corresponding to data from one participant (i.e., different response
categories correspond to different columns) and the data can contain a single
between-subjects condition. Model equations need to be passed as a
.eqn model file and category labels (first column in
need to match the column names in
data. Results are returned in one
tibble with one row per estimation method.
A model definition, typically the path to an
This functions is a fancy wrapper for packages MPTinR and
TreeBUGS applying various frequentist and Bayesian estimation methods
to the same data set using a single MPT model and collecting the results
tibble where each row corresponds to one
estimation method. Note that parameter restrictions (e.g., equating
different parameters or fixing them to a constant) need to be part of the
model (i.e., the
.eqn file) and cannot be passed as an argument.
The settings for the various methods are specified via function
mpt_options. The default settings use all available cores for
calculating the boostrap distribution as well as independent MCMC chains
and should be appropriate for most situations.
The data can have a single between-subjects condition (specified via
condition). This condition can have more than two levels. If
specified, the pairwise differences between each level, the standard error
of the differences, and confidence-intervals of the differences are
calculated for each parameter. Please note that
silently converted to
character in the output. Thus, a specific
ordering of the
factor levels in the output cannot be guaranteed.
Parameter differences or other support for within-subject conditions is not provided. The best course of action for within-subjects conditions is to simply include separate trees and separate sets of parameters for each within-subjects condition. This allows to at least compare the estimates for each within-subjects condition across estimation method.
Maximum-likelihood estimation with MPTinR via
"asymptotic_complete": Asymptotic ML theory, complete
"asymptotic_no": Asymptotic ML theory, no pooling
"pb_no": Parametric bootstrap, no pooling
"npb_no": Nonparametric bootstrap, no pooling
Maximum-likelihood estimation with HMMTreeR
"latent_class": Asymptotic ML theory, partial pooling, latent-class approach
Bayesian estimation with TreeBUGS
"simple": Bayesian estimation, no pooling (C++,
"simple_pooling": Bayesian estimation, complete pooling
"trait": latent-trait model, partial pooling (JAGS,
"trait_uncorrelated": latent-trait model without
correlation parameters, partial pooling (JAGS,
"beta": beta-MPT model, partial pooling (JAGS,
"betacpp": beta-MPT model, partial pooling (C++,
For the complete pooling asymptotic approach, the group-level parameter
estimates and goodness-of-fit statistics are the maximum-likelihood and
G-squared values returned by
MPTinR. The parameter differences are
based on these values, the standard errors of the difference is simply
the pooled standard error of the individual parameters. The overall fit
gof) is based on an additional fit to the completely
For the no pooling asymptotic approach, the individual-level
maximum-likelihood estimates are reported in column
gof_indiv and provide the basis for the other results. Whether or
not an individual-level parameter estimate is judged as identifiable
identifiable) is based on separate fits with different
random starting values. If, in these separate, fits the same objective
criterion is reached several times (i.e.,
.01 of best fit), but the parameter estimate differs (i.e., different
estimates within .01 of each other), then an estimate is flagged as
non-identifiable. If they are the same (i.e., within .01 of each other)
they are marked as identifiable. The group-level parameters are simply
the means of the identifiable individual-level parameters, the SE is the
SE of the mean for these parameter (i.e., SD/sqrt(N), where N excludes
non-identifiable parameters and thise estimated as NA), and the CI is
based on mean and SE. The group-level and overall fit is the sum of the
individual G-squares, sum of individual-level df, and corresponding
chi-square df. The difference between the conditions and corresponding
statistics are based on a t-test comparing the individual-level estimates
(again, after excluding non-identifiable estimates). The CIs of the
difference are based on the SEs (which are derived from a linear model
equivalent to the t-test).
The individual-level estimates of the
bootstrap based no-pooling
approaches are identical to the asymptotic ones. However, the SE is the
SD of the bootstrapped distribution of parameter estimates, the CIs are
the corresponding quantiles of the bootstrapped distribution, and the
p-value is obtained from the bootstrapped G-square distribution.
Identifiability of individual-level parameter estimates is also based on
the bootstrap distribution of estimates. Specifically, we calculate the
range of the CI (i.e., maximum minus minimum CI value) and flag those
parameters as non-identifiable for which the range is larger than
mpt_options()$max_ci_indiv, which defaults to
in the default settings we say a parameter is non-identifiable if the
bootstrap based CI extends from 0 to 1. The group-level estimates are the
mean of the identifiable individual-level estimates. And difference
between conditions is calculated in the same manner as for the asymptotic
case using the identifiable individual-level parameter esatimates.
The latent-class approach is fitted by interfacing
a software package that is only available on Microsoft Windows Machines.
To install this software and the necessary
R interface, use
It is currently not possible to estimate models that contain parameters
that are fixed to numerical values.
Multiple latent-class models with differing number of latent classes are
estimated. The model that obtains the lowest AIC while still being
identified is selected for extracting parameter estimates
The returned group-level parameter estimates are calculated as the
weighted mean of parameter estimates of latent classes. Corresponding SEs
are given by the square root of the weighted mean of class-wise squared
SEs. Goodness-of-fit statistics are M1, M2, S1, and S2 as described by
The simple approaches fit fixed-effects MPT models.
"simple" uses no pooling and thus assumes independent uniform priors
for the individual-level parameters. Group-level means are
obtained as generated quantities by averaging the posterior samples
"simple_pooling" aggregates observed
frequencies across participants and assumes a uniform prior for the
The latent-trait approaches transform the individual-level
parameters to a latent probit scale using the inverse cumulative standard
normal distribution. For these probit values, a multivariate normal
distribution is assumed at the group level. Whereas
estimates the corresponding correlation matrix of the parameters
(reported in the column
assumes that the parameters are uncorrelated.
For all Bayesian methods, the posterior distribution of the parameters is
summarized by the posterior mean (in the column
standard deviation (
se), and credbility intervals (
For parameter differences (
test_between) and correlations
est_rho), Bayesian p-values are computed (column
counting the relative proportion of posterior samples that are smaller
than zero. Goodness of fit is tested with the T1 statistic
(observed vs. posterior-predicted average frequencies,
"mean") and the T2 statistic (observed vs. posterior-predicted
covariance of frequencies,
focus = "cov").
tibble with one row per estimation
method and the
model: Name of model file (copied from
dataset: Name of data set (copied from
character specifying the level of pooling with
three potential values:
c("complete", "no", "partial")
character specifying the package used for
estimation with two potential values:
character specifying the method used with the
following potential values:
c("asymptotic", "PB/MLE", "NPB/MLE",
"simple", "trait", "trait_uncorrelated", "beta", "betacpp")
est_group: Group-level parameter estimates per condition/group.
est_indiv: Individual-level parameter estimates (if provided
est_rho: Estimated correlation of individual-level parameters
on the probit scale (only in
test_between: Parameter differences between the levels of the
between-subjects condition (if specified).
gof: Overall goodness of fit across all individuals.
gof_group: Group-level goodness of fit.
gof_indiv: Individual-level goodness of fit.
fungibility: Posterior correlation of the group-level means
pnorm(mu) (only in
test_homogeneity: Chi-square based test of participant
homogeneity proposed by Smith and Batchelder (2008). This test is the same
for each estimation method.
convergence: Convergence information provided by the
respective estimation method. For the asymptotic frequentist methods this
tibble with rank of the Fisher matrix, the number of parameters
(which should match the rank of the Fisgher matrix), and the convergence
code provided by the optimization algorithm (which is
nlminb). The boostrap methods contain an additional column,
parameter, that contains the information which (if any) parameters
are empirically non-identifiable based on the bootstrapped distribution of
parameter estimates (see above for exact description). For the Bayesian
methods this is a
tibble containing information of the posterior
dsitribution (i.e., mean, quantiles, SD, SE,
n.eff, and R-hat) for
estimation: Time it took for each estimation method and group.
options: Options used for estimation. Obtained by running
With the exception of the first five columns (i.e., after
list columns typically holding one
tibble per cell.
The simplest way to analyze the results is separately per column using
unnest. Examples for this are given below.
Smith, J. B., & Batchelder, W. H. (2008). Assessing individual differences in categorical data. Psychonomic Bulletin & Review, 15(4), 713-731. https://doi.org/10.3758/PBR.15.4.713
Klauer, K.C. (2006). Hierarchical multinomial processing tree models: A latent-class approach. Psychometrika, 71 (1), 7-31. https://doi.org/10.1007/s11336-004-1188-3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117
# ------------------------------------------------------------------------------ # MPT model definition & Data EQN_FILE <- system.file("extdata", "prospective_memory.eqn", package = "MPTmultiverse") DATA_FILE <- system.file("extdata", "smith_et_al_2011.csv", package = "MPTmultiverse") ### if .csv format uses semicolons ";" (e.g., German format): # data <- read.csv2(DATA_FILE, fileEncoding = "UTF-8-BOM") ### if .csv format uses commata "," (international format): data <- read.csv(DATA_FILE, fileEncoding = "UTF-8-BOM") data <- data[c(1:10, 113:122),] ## select only subset of data for example head(data) COL_CONDITION <- "WM_EX" # name of the variable encoding group membership # experimental condition should be labeled meaningfully ---- unique(data[[COL_CONDITION]]) data[[COL_CONDITION]] <- factor( data[[COL_CONDITION]] , levels = 1:2 , labels = c("low_WM", "high_WM") ) # define core parameters: CORE <- c("C1", "C2") ## Not run: op <- mpt_options() ## to reset default options (which you would want) use: mpt_options("default") mpt_options() # to see the settings ## Note: settings are also saved in the results tibble ## without specifying method, all are used per default fit_all <- fit_mpt( dataset = DATA_FILE , data = data , model = EQN_FILE , condition = COL_CONDITION , core = CORE ) mpt_options(op) ## reset options ## End(Not run) load(system.file("extdata", "prospective_memory_example.rda", package = "MPTmultiverse")) # Although we requested all 10 methods, only 9 worked: fit_all$method # Jags variant of beta MPT is missing. # the returned method has a plot method. For example, for the group-level estimates: plot(fit_all, which = "est") ## Not run: ### Full analysis of results requires dplyr and tidyr (or just 'tidyverse') library("dplyr") library("tidyr") ## first few columns identify model, data, and estimation approach/method ## remaining columns are list columns containing the results for each method ## use unnest to work with each of the results columns glimpse(fit_all) ## Let us inspect the group-level estimates fit_all %>% select(method, pooling, est_group) %>% unnest() ## which we can plot again plot(fit_all, which = "est") ## Next we take a look at the GoF fit_all %>% select(method, pooling, gof_group) %>% unnest() %>% as.data.frame() # Again, we can plot it as well plot(fit_all, which = "gof2") ## use "gof1" for overall GoF ## Finally, we take a look at the differences between conditions fit_all %>% select(method, pooling, test_between) %>% unnest() # and then we plot it plot(fit_all, which = "test_between") ### Also possible to only use individual methods: only_asymptotic <- fit_mpt( method = "asymptotic_no" , dataset = DATA_FILE , data = data , model = EQN_FILE , condition = COL_CONDITION , core = CORE ) glimpse(only_asymptotic) bayes_complete <- fit_mpt( method = c("simple_pooling") , dataset = DATA_FILE , data = data , model = EQN_FILE , condition = COL_CONDITION , core = CORE ) glimpse(bayes_complete) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.