Description Usage Arguments Details Value References Examples
Performs a multiverse analysis for multinomial processing tree (MPT) models
across maximumlikelihood/frequentist and Bayesian estimation approaches. For
the frequentist approaches, no pooling (with and without parametric or
nonparametric bootstrap) and complete pooling are implemented using
MPTinR. For the Bayesian approaches, no pooling, complete pooling, and
three different variants of partial pooling are implemented using
TreeBUGS. Requires data
on a byparticipant level with each row
corresponding to data from one participant (i.e., different response
categories correspond to different columns) and the data can contain a single
betweensubjects condition. Model equations need to be passed as a
.eqn
model file and category labels (first column in .eqn
file)
need to match the column names in data
. Results are returned in one
tibble
with one row per estimation method.
1 2 
model 
A model definition, typically the path to an 
dataset 
scalar 
data 
A 
id 
scalar 
condition 
scalar 
core 

method 

This functions is a fancy wrapper for packages MPTinR and
TreeBUGS applying various frequentist and Bayesian estimation methods
to the same data set using a single MPT model and collecting the results
in one tibble
where each row corresponds to one
estimation method. Note that parameter restrictions (e.g., equating
different parameters or fixing them to a constant) need to be part of the
model (i.e., the .eqn
file) and cannot be passed as an argument.
The settings for the various methods are specified via function
mpt_options
. The default settings use all available cores for
calculating the boostrap distribution as well as independent MCMC chains
and should be appropriate for most situations.
The data can have a single betweensubjects condition (specified via
condition
). This condition can have more than two levels. If
specified, the pairwise differences between each level, the standard error
of the differences, and confidenceintervals of the differences are
calculated for each parameter. Please note that condition
is
silently converted to character
in the output. Thus, a specific
ordering of the factor
levels in the output cannot be guaranteed.
Parameter differences or other support for withinsubject conditions is not provided. The best course of action for withinsubjects conditions is to simply include separate trees and separate sets of parameters for each withinsubjects condition. This allows to at least compare the estimates for each withinsubjects condition across estimation method.
Maximumlikelihood estimation with MPTinR via
fit.mpt
:
"asymptotic_complete"
: Asymptotic ML theory, complete
pooling
"asymptotic_no"
: Asymptotic ML theory, no pooling
"pb_no"
: Parametric bootstrap, no pooling
"npb_no"
: Nonparametric bootstrap, no pooling
Maximumlikelihood estimation with HMMTreeR
"latent_class"
: Asymptotic ML theory, partial pooling, latentclass approach
Bayesian estimation with TreeBUGS
"simple"
: Bayesian estimation, no pooling (C++,
simpleMPT)
"simple_pooling"
: Bayesian estimation, complete pooling
(C++, simpleMPT)
"trait"
: latenttrait model, partial pooling (JAGS,
traitMPT)
"trait_uncorrelated"
: latenttrait model without
correlation parameters, partial pooling (JAGS,
traitMPT)
"beta"
: betaMPT model, partial pooling (JAGS,
betaMPT)
"betacpp"
: betaMPT model, partial pooling (C++,
betaMPTcpp)
For the complete pooling asymptotic approach, the grouplevel parameter
estimates and goodnessoffit statistics are the maximumlikelihood and
Gsquared values returned by MPTinR
. The parameter differences are
based on these values, the standard errors of the difference is simply
the pooled standard error of the individual parameters. The overall fit
(column gof
) is based on an additional fit to the completely
aggregated data.
For the no pooling asymptotic approach, the individuallevel
maximumlikelihood estimates are reported in column est_indiv
and
gof_indiv
and provide the basis for the other results. Whether or
not an individuallevel parameter estimate is judged as identifiable
(column identifiable
) is based on separate fits with different
random starting values. If, in these separate, fits the same objective
criterion is reached several times (i.e., Log.Likelihood
within
.01 of best fit), but the parameter estimate differs (i.e., different
estimates within .01 of each other), then an estimate is flagged as
nonidentifiable. If they are the same (i.e., within .01 of each other)
they are marked as identifiable. The grouplevel parameters are simply
the means of the identifiable individuallevel parameters, the SE is the
SE of the mean for these parameter (i.e., SD/sqrt(N), where N excludes
nonidentifiable parameters and thise estimated as NA), and the CI is
based on mean and SE. The grouplevel and overall fit is the sum of the
individual Gsquares, sum of individuallevel df, and corresponding
chisquare df. The difference between the conditions and corresponding
statistics are based on a ttest comparing the individuallevel estimates
(again, after excluding nonidentifiable estimates). The CIs of the
difference are based on the SEs (which are derived from a linear model
equivalent to the ttest).
The individuallevel estimates of the bootstrap based nopooling
approaches are identical to the asymptotic ones. However, the SE is the
SD of the bootstrapped distribution of parameter estimates, the CIs are
the corresponding quantiles of the bootstrapped distribution, and the
pvalue is obtained from the bootstrapped Gsquare distribution.
Identifiability of individuallevel parameter estimates is also based on
the bootstrap distribution of estimates. Specifically, we calculate the
range of the CI (i.e., maximum minus minimum CI value) and flag those
parameters as nonidentifiable for which the range is larger than
mpt_options()$max_ci_indiv
, which defaults to 0.99
. Thus,
in the default settings we say a parameter is nonidentifiable if the
bootstrap based CI extends from 0 to 1. The grouplevel estimates are the
mean of the identifiable individuallevel estimates. And difference
between conditions is calculated in the same manner as for the asymptotic
case using the identifiable individuallevel parameter esatimates.
The latentclass approach is fitted by interfacing HMMTree
,
a software package that is only available on Microsoft Windows Machines.
To install this software and the necessary R
interface, use
devtools::install_github("methexp/HMMTreeR")
.
It is currently not possible to estimate models that contain parameters
that are fixed to numerical values.
Multiple latentclass models with differing number of latent classes are
estimated. The model that obtains the lowest AIC while still being
identified is selected for extracting parameter estimates
The returned grouplevel parameter estimates are calculated as the
weighted mean of parameter estimates of latent classes. Corresponding SEs
are given by the square root of the weighted mean of classwise squared
SEs. Goodnessoffit statistics are M1, M2, S1, and S2 as described by
Klauer(2006).
The simple approaches fit fixedeffects MPT models.
"simple"
uses no pooling and thus assumes independent uniform priors
for the individuallevel parameters. Grouplevel means are
obtained as generated quantities by averaging the posterior samples
across participants. "simple_pooling"
aggregates observed
frequencies across participants and assumes a uniform prior for the
grouplevel parameters.
The latenttrait approaches transform the individuallevel
parameters to a latent probit scale using the inverse cumulative standard
normal distribution. For these probit values, a multivariate normal
distribution is assumed at the group level. Whereas "trait"
estimates the corresponding correlation matrix of the parameters
(reported in the column est_rho
), "trait_uncorrelated"
assumes that the parameters are uncorrelated.
For all Bayesian methods, the posterior distribution of the parameters is
summarized by the posterior mean (in the column est
), posterior
standard deviation (se
), and credbility intervals (ci_*
).
For parameter differences (test_between
) and correlations
(est_rho
), Bayesian pvalues are computed (column p
) by
counting the relative proportion of posterior samples that are smaller
than zero. Goodness of fit is tested with the T1 statistic
(observed vs. posteriorpredicted average frequencies, focus =
"mean"
) and the T2 statistic (observed vs. posteriorpredicted
covariance of frequencies, focus = "cov"
).
A tibble
with one row per estimation method
and the
following columns:
model
: Name of model file (copied from model
argument),
character
dataset
: Name of data set (copied from dataset
argument), character
pooling
: character
specifying the level of pooling with
three potential values: c("complete", "no", "partial")
package
: character
specifying the package used for
estimation with two potential values: c("MPTinR", "TreeBUGS")
method
: character
specifying the method used with the
following potential values: c("asymptotic", "PB/MLE", "NPB/MLE",
"simple", "trait", "trait_uncorrelated", "beta", "betacpp")
est_group
: Grouplevel parameter estimates per condition/group.
est_indiv
: Individuallevel parameter estimates (if provided
by method).
est_rho
: Estimated correlation of individuallevel parameters
on the probit scale (only in method="trait"
).
test_between
: Parameter differences between the levels of the
betweensubjects condition (if specified).
gof
: Overall goodness of fit across all individuals.
gof_group
: Grouplevel goodness of fit.
gof_indiv
: Individuallevel goodness of fit.
fungibility
: Posterior correlation of the grouplevel means
pnorm(mu)
(only in method="trait"
).
test_homogeneity
: Chisquare based test of participant
homogeneity proposed by Smith and Batchelder (2008). This test is the same
for each estimation method.
convergence
: Convergence information provided by the
respective estimation method. For the asymptotic frequentist methods this
is a tibble
with rank of the Fisher matrix, the number of parameters
(which should match the rank of the Fisgher matrix), and the convergence
code provided by the optimization algorithm (which is
nlminb
). The boostrap methods contain an additional column,
parameter
, that contains the information which (if any) parameters
are empirically nonidentifiable based on the bootstrapped distribution of
parameter estimates (see above for exact description). For the Bayesian
methods this is a tibble
containing information of the posterior
dsitribution (i.e., mean, quantiles, SD, SE, n.eff
, and Rhat) for
each parameter.
estimation
: Time it took for each estimation method and group.
options
: Options used for estimation. Obtained by running
mpt_options()
With the exception of the first five columns (i.e., after method
) all
columns are list
columns typically holding one tibble
per cell.
The simplest way to analyze the results is separately per column using
unnest
. Examples for this are given below.
Smith, J. B., & Batchelder, W. H. (2008). Assessing individual differences in categorical data. Psychonomic Bulletin & Review, 15(4), 713731. https://doi.org/10.3758/PBR.15.4.713
Klauer, K.C. (2006). Hierarchical multinomial processing tree models: A latentclass approach. Psychometrika, 71 (1), 731. https://doi.org/10.1007/s1133600411883
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117  # 
# MPT model definition & Data
EQN_FILE < system.file("extdata", "prospective_memory.eqn", package = "MPTmultiverse")
DATA_FILE < system.file("extdata", "smith_et_al_2011.csv", package = "MPTmultiverse")
### if .csv format uses semicolons ";" (e.g., German format):
# data < read.csv2(DATA_FILE, fileEncoding = "UTF8BOM")
### if .csv format uses commata "," (international format):
data < read.csv(DATA_FILE, fileEncoding = "UTF8BOM")
data < data[c(1:10, 113:122),] ## select only subset of data for example
head(data)
COL_CONDITION < "WM_EX" # name of the variable encoding group membership
# experimental condition should be labeled meaningfully 
unique(data[[COL_CONDITION]])
data[[COL_CONDITION]] < factor(
data[[COL_CONDITION]]
, levels = 1:2
, labels = c("low_WM", "high_WM")
)
# define core parameters:
CORE < c("C1", "C2")
## Not run:
op < mpt_options()
## to reset default options (which you would want) use:
mpt_options("default")
mpt_options() # to see the settings
## Note: settings are also saved in the results tibble
## without specifying method, all are used per default
fit_all < fit_mpt(
dataset = DATA_FILE
, data = data
, model = EQN_FILE
, condition = COL_CONDITION
, core = CORE
)
mpt_options(op) ## reset options
## End(Not run)
load(system.file("extdata", "prospective_memory_example.rda", package = "MPTmultiverse"))
# Although we requested all 10 methods, only 9 worked:
fit_all$method
# Jags variant of beta MPT is missing.
# the returned method has a plot method. For example, for the grouplevel estimates:
plot(fit_all, which = "est")
## Not run:
### Full analysis of results requires dplyr and tidyr (or just 'tidyverse')
library("dplyr")
library("tidyr")
## first few columns identify model, data, and estimation approach/method
## remaining columns are list columns containing the results for each method
## use unnest to work with each of the results columns
glimpse(fit_all)
## Let us inspect the grouplevel estimates
fit_all %>%
select(method, pooling, est_group) %>%
unnest()
## which we can plot again
plot(fit_all, which = "est")
## Next we take a look at the GoF
fit_all %>%
select(method, pooling, gof_group) %>%
unnest() %>%
as.data.frame()
# Again, we can plot it as well
plot(fit_all, which = "gof2") ## use "gof1" for overall GoF
## Finally, we take a look at the differences between conditions
fit_all %>%
select(method, pooling, test_between) %>%
unnest()
# and then we plot it
plot(fit_all, which = "test_between")
### Also possible to only use individual methods:
only_asymptotic < fit_mpt(
method = "asymptotic_no"
, dataset = DATA_FILE
, data = data
, model = EQN_FILE
, condition = COL_CONDITION
, core = CORE
)
glimpse(only_asymptotic)
bayes_complete < fit_mpt(
method = c("simple_pooling")
, dataset = DATA_FILE
, data = data
, model = EQN_FILE
, condition = COL_CONDITION
, core = CORE
)
glimpse(bayes_complete)
## End(Not run)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.