Given a data file (text file or data.frame
) with the correct variable names (see section 1.3) and a model file (see section 1.1 and 2) the steps needed to fit either an exponential RT-MPT model (Hartmann et al., 2020; Hartmann \& Klauer, 2020; Klauer \& Kellen, 2018) or a diffusion RT-MPT model (Klauer et al., submitted) to the data is described in Section 1:
1.1 Convert the equation/model file to an ertmpt_model
or drtmpt_model
via the to_ertmpt_model()
or to_drtmpt_model()
function
1.2 If necessary, specify some restrictions
1.3 Convert the data file to an ertmpt_data
or drtmpt_data
object via the to_ertmpt_data()
or to_drtmpt_data()
function
1.4 Fit the specified model to the data via the fit_ertmpt()
or fit_drtmpt()
function
1.5 Check convergence of the model parameters and getting a summary
In Section 2, 3, and 4 more information about the equation/model files, the data and additional functions is provided.
The equation/model file can be provided by a text file or directly like the following. One might use a syntax that is comparable to the EQN
syntax by Heck, Arnold, and Arnold (2018) or Hu (1999) or a syntax that is comparable to the MDL
syntax by Singmann and Kellen (2013). In the EQN
syntax you can separate trees, categories and paths either by using semicolons (like in the example below) or commas, but not mixed:
eqn = " # CORE MODEL ## EQN SYNTAX ## tree ; cat ; mpt 0 ; 0 ; Do 0 ; 0 ; (1-Do)*g 0 ; 1 ; (1-Do)*(1-g) 1 ; 3 ; Dn 1 ; 2 ; (1-Dn)*g 1 ; 3 ; (1-Dn)*(1-g) "
In this eqn
object we specify a Two-High Threshold (2HT) model. Comments can be done with a #
symbol at the beginning of a line.
The same model can be specified with the MDL
syntax:
mdl = " # CORE MODEL ## MDL SYNTAX ### targets Do+(1-Do)*g (1-Do)*(1-g) ### lure (1-Dn)*g Dn+(1-Dn)*(1-g) "
All paths/branches that lead to the same response category will be added together with a +
symbol.
More information about these two syntaxes are written in Section 2.
The conversion to an ertmpt_model
or drtmpt_model
list can then be done by the to_ertmpt_model()
or to_drtmpt_model()
, respectively, function:
library(rtmpt) # MAKING AN ERTMPT MODEL # using the MDL syntax: e2HTM <- to_ertmpt_model(mdl_file = mdl) # using the EQN syntax: e2HTM <- to_ertmpt_model(eqn_file = eqn) e2HTM # MAKING A DRTMPT MODEL # using the MDL syntax: d2HTM <- to_drtmpt_model(mdl_file = mdl) # using the EQN syntax: d2HTM <- to_drtmpt_model(eqn_file = eqn) d2HTM #
Next we discuss model restrictions for an ertmpt_model
(1.2.1) and a drtmpt_model
(1.2.2), and the response mapping if more than one encoding plus motor execution time is wanted (1.2.3).
There are four functions with which the process parameters of an ertmpt_model
object can be changed. Two functions can be used to hold a process probability parameter (theta) constant or to make some process probability parameters equal. Two functions can be used to suppress process completion times (taus), which means the process times are set to zero, or to make some process completion times equal.
To hold a process probability constant you can use theta2const()
like this:
theta2const(model = e2HTM, names = "g", constants = 0.5)
Here, "g"
will be set to 0.5 and will, therefore, not be estimated. You might also use this function with a vector of names
(and a vector of constants
). Note that the new model is not saved because we did not assign it to an R object.
An alias for this function is set_theta_const()
.
To set two or more process probabilities equal, you can use theta2theta()
like this:
theta2theta(model = e2HTM, names = c("Do", "Dn"), keep_consts = FALSE)
The process probabilities for all processes specified in names
are set equal. The first of the process names
in alphabetical order (here "Dn"
) denotes the reference probability that will be estimated. The other(s) will not be estimated but will be set to be equal to the reference process probability. The keep_consts
argument can be used, if you want to keep constants that you already specified and all other process probabilities should have the same constant, but we suggest to use theta2const()
for that.
An alias for this function is set_thetas_equal()
.
To suppress a process completion time (i.e., set it to zero) you can use tau2zero()
like this:
tau2zero(model = e2HTM, names = "g", outcomes = "minus", values = 0)
Here, the process completion time of "g"
with outcome "minus"
(meaning the time for 1-g) is set to zero and, therefore, will not be estimated (which typically does not make sense to do). Also this function can be used with vectors.
An alias for this function is set_tau_zero()
.
To set two or more process completion times with the same outcome equal you can use theta2theta()
like this:
tau2tau(model = e2HTM, names = c("Do", "Dn"), outcome = "minus", keep_zeros = FALSE)
The process completion times for all processes specified in names
are set equal (i.e., all process completion times in minus-direction will be equal, but all process completion times in plus-direction will still be estimated, unless you set outcome
to "both"). The first of the process names
in alphabetical order (here "Dn"
) denotes the reference process times in minus direction (and/or plus direction) that will be estimated. The other(s) will not be estimated, but will be set to be equal to the reference process time(s). The keep_zeros
argument can be used, if you want to keep the zeros that you already specified and all other process times with the same outcome should be zero as well. Also here we suggest using the tau2zero()
function.
An alias for this function is set_taus_equal()
.
There are six functions with which the process parameters of a drtmpt_model
object can be changed. Three functions can be used to set some process parameters to constants and three functions can be used to make some process parameters equal.
To hold a process threshold constant you can use a2const()
like this:
a2const(model = d2HTM, names = "g", constants = 1)
Here, the threshold for "g"
will be set to 1.0 and, therefore, will not be estimated. You might also use this function with a vector of names
(and a vector of constants
). Note that the new model is not saved because we did not assign it to an R object.
An alias for this function is set_a_equal()
.
To set two or more process thresholds equal you can use a2a()
like this:
a2a(model = d2HTM, names = c("Do", "Dn"), keep_consts = FALSE)
The process thresholds for all processes specified in names
are set equal. The first of the process names
in alphabetical order (here "Dn"
) denotes the reference probability that will be estimated. The other(s) will not be estimated but will be set to be equal to the reference process threshold. The keep_consts
argument can be used, if you want to keep constants that you already specified and all other process thresholds should have the same constant, but we suggest to use a2const()
for that.
An alias for this function is set_a_equal()
.
To hold a process drift-rate constant you can use nu2const()
like this:
nu2const(model = d2HTM, names = "g", constants = 1)
Here, the drift rate for "g"
will be set to 1.0 and, therefore, will not be estimated. You might also use this function with a vector of names
(and a vector of constants
). Note that the new model is not saved because we did not assign it to an R object.
An alias for this function is set_nu_equal()
.
To set two or more process drift-rates equal you can use nu2nu()
like this:
nu2nu(model = d2HTM, names = c("Do", "Dn"), keep_consts = FALSE)
The process drift-rates for all processes specified in names
are set equal. The first of the process names
in alphabetical order (here "Dn"
) denotes the reference probability that will be estimated. The other(s) will not be estimated but will be set to be equal to the reference process drift-rate The keep_consts
argument can be used, if you want to keep constants that you already specified and all other process drift-rates should have the same constant, but we suggest to use nu2const()
for that.
An alias for this function is set_nus_equal()
.
To hold a process relative-starting-point constant you can use omega2const()
like this:
omega2const(model = d2HTM, names = "g", constants = 0.5)
Here, the relative starting point for "g"
will be set to 0.5 and, therefore, will not be estimated. You might also use this function with a vector of names
(and a vector of constants
). Note that the new model is not saved because we did not assign it to an R object.
An alias for this function is set_omega_equal()
.
To set two or more process relative-starting-points equal you can use omega2omega()
like this:
omega2omega(model = d2HTM, names = c("Do", "Dn"), keep_consts = FALSE)
The process relative-starting-points for all processes specified in names
are set equal. The first of the process names
in alphabetical order (here "Dn"
) denotes the reference probability that will be estimated. The other(s) will not be estimated but will be set to be equal to the reference process threshold. The keep_consts
argument can be used, if you want to keep constants that you already specified and all other process relative-starting-points should have the same constant, but we suggest to use omega2const()
for that.
An alias for this function is set_omegas_equal()
.
You can choose which categories of the exponential/diffusion RT-MPT model should have the same encoding plus motor execution time. Unlike the processes the motor plus execution times have no names which we can use to set them equal. Therefore a mapping is used in the function delta2delta()
where the response categories (categories
) can be mapped to encoding plus motor execution times by using a mapping argument (mappings
). Because the response categories are not always unique the tree (trees
) has to be provided as well.
delta2delta(model = e2HTM, trees = c(0,1), categories = c(1,3), mappings = c(1,1))
The numbers for the mappings have to start at zero and no whole number can be skipped. For example in the above code the mappings
cannot be c(2,2)
since the number 1 would be missing then in the mappings.
An alias for this function is set_deltas_equal()
.
The data should be provided with the following variables in this order: subj
, group
, tree
, cat
, and rt
. All of these variables except rt
should have values from zero to the number of unique values minus one. This will get clear with an example; if we have 60 subjects, two groups, two trees and four categories we would have the following values in the variables:
subj = {0,1,2,...,58,59}
, group = {0,1}
, tree = {0,1}
, and cat = {0,1,2,3}
.rt
should be provided in milliseconds and as integers. If it has decimal places it will be rounded to a whole number.
Given the right order of the variables and the right values one can use a data.frame
or a path to a text file containing the variable names in the first line. If not, the to_ertmpt_data()
or to_drtmpt_data()
function can be used to reorder the variables and transform the values of those variables. Next to the new data.frame
the transformation that is used will be saved in an ertmpt_data
or drtmpt_data
list. Note that at least the variable names should be correctly provided in order to make to_ertmpt_data()
or to_drtmpt_data()
work. These functions are also used automatically in the fit_ertmpt()
or fit_drtmpt()
function, but should be used at least once, when the model is specified the first time, to check whether the data is transformed according to the equation/model file.
Here is an example of a data set with unordered variables and wrong values (not starting from zero) and a transformation of this data set:
set.seed(2021) raw_data <- data.frame(tree = rep(1:2, 8), rt = round(1000*(runif(16)+.3)), group = rep(1:2, each=8), subj = rep(1:4, each = 4), cat = rep(1:4, 4)) raw_data data <- to_ertmpt_data(raw_data = raw_data, model = e2HTM) data
Note that it is never a good idea to specify the tree and category labels/numbers differently from the data file, or vice versa. It is not recommended to define a model with tree and category values starting from zero and using a data set with tree and category values starting from one; Make sure the labels or numbers for these two match.
Finally, with the function fit_ertmpt()
or fit_drtmpt()
one can fit the model to the data. This code should not be run. There are way too few data points.
## do not run ertmpt_out <- fit_ertmpt(model = e2HTM, data = data) ## end not run
In this function call we used the default values for all the parameters that are not listed in the call. model
and data
do not have default values. The full call of the function would look like this:
## do not run ertmpt_out <- fit_ertmpt(model = restr_2HTM, data = data, n.chains = 4, n.iter = 5000, n.burnin = 200, n.thin = 1, Rhat_max = 1.05, Irep = 1000, prior_params = NULL, indices = FALSE, save_log_lik = FALSE) ## end not run
n.chains
is the number of chains with a maximum of 16, n.iter
is the number of samples to draw from the posterior distribution, n.burnin
is analogue to the burn-in phase for R package rjags
and denotes the number of warm-up sampling, n.thin
is the thinning parameter, Rhat_max
is a threshold for the potential scale reduction factor and forces the sampling to start only when the maximal potential scale reduction factor of the parameters is lower or equal to this value, Irep
is the number of samples until the next summary is printed, prior_params
is a list of some prior parameter specifications, indices
is a logical parameter with which one can allow to compute the WAIC and LOO, and save_log_lik
is a logical parameter that allows one to save the log-likelihood for each posterior sample and data point.
The equivalent for the drtmpt
is
## do not run drtmpt_out <- fit_drtmpt(model = d2HTM, data = data) ## end not run
Where also the the default values for all the parameters are used. Also here, model
and data
do not have default values. The full call of the function would look like this:
## do not run drtmpt_out <- fit_drtmpt(model = restr_2HTM, data = data, n.chains = 4, n.iter = 1000, n.phase1 = 1000, n.phase1 = 2000, n.thin = 1, Rhat_max = 1.05, Irep = 1000, prior_params = NULL, flags = NULL, control = NULL) ## end not run
n.chains
, n.iter
, n.thin
, Rhat_max
, and Irep
, are the same as for fit_ertmpt()
. prior_params
is still a list of some prior parameter specifications, but with other prior parameters than in fit_drtmpt
(see section 4 for more details). n.phase1
and n.phase2
are similar to n.burnin
, but are for the adaptation of the Hamiltonian algorithm and the estimation of the covariance matrix of all parameters, respectively. flags
should be a list of four logical variables, namely old_label
(the equivalent of the argument in fit_ertmpt()
), indices
(also an equivalent of the argument in fit_ertmpt()
), log_lik
(the equivalent of the save_log_lik
argument in fit_ertmpt()
), and random_init
(which asks whether the initial value of the MCMC algorithm should be random or estimated by maximum likelihood). Finaly, control
is a list of numeric variables which gives the user some control over the algorithm and its speed, namely maxthreads
(number of maximal number of threads for multithreading), maxtreedepth1_3
(number of tree depth of the no-U-turn algorithm in phase 1 and 3), and maxtreedepth4
(number of tree depth of the no-U-turn algorithm in phase 4).
Convergence of the model can be checked by using the fitted object. Just get the diagnostics and then R-hat. In order to get a traceplot one can use the traceplot()
function from coda
. You can also use the summary()
function to get a summary of the estimates.
Here a code snippet for ertmpt
ertmpt_out$diags$R_hat coda::traceplot(ertmpt_out$samples[, 1:9]) summary(ertmpt_out)
or drtmpt
drtmpt_out$diags$R_hat coda::traceplot(drtmpt_out$samples[, 1:9]) summary(drtmpt_out)
In previous versions it was possible to set some restrictions directly in the equation/model file, but due to the number of restrictions that are possible we decided to use now functions instead (see Subsection 1.2).
As we have seen above one can specify an equation file (eqn_file
) like this:
eqn = " # CORE MODEL ## tree ; cat ; path 0 ; 0 ; Do 0 ; 0 ; (1-Do)*g 0 ; 1 ; (1-Do)*(1-g) 1 ; 3 ; Dn 1 ; 2 ; (1-Dn)*g 1 ; 3 ; (1-Dn)*(1-g) "
The trees, categories, and paths need to be separated by a semi-colon or a comma, but should not be mixed. As mentioned above the eqn_file
can be provided by a text file through a path to the file or written directly in R/Rstudio like above.
Another way of specifying the same model is by providing a model file (mdl_file
). This is also not yet the model itself as needed for the fit_rtmpt()
call, but much closer. It is based on the syntax developed by Singmann and Kellen (2013).
mdl = " # CORE MODEL ## targets: Do+(1-Do)*g (1-Do)*(1-g) ## lures (1-Dn)*g Dn+(1-Dn)*(1-g) "
Between two trees (here targets and lures) there must be an empty line. All paths which lead to the same response category are added together on one line by a +
symbol.
Internally an eqn_file
will be converted to an mdl_file
. From that a text file will be written locally onto your computer (in a temporary file within a temporary folder created by the R session), used by the fit_ertmpt()
or fit_drtmpt()
call, and removed afterwards.
When labels are used in the data set one has to be cautious when specifying the equation file. The labels of the trees and categories in the equation file must match those of the data. For example, if the tree labels "target" and "lure" are used in the data set, one has to use those also in the equation file (eqn_file
). If you specify a model file (mdl_file
) instead of the equation file the labels in the data set need to be changed to numerics. Otherwise no mapping between those can be done.
When numbers are used in the data set for the trees and categories it is best to also check that they match with the ones used in the equation / model file, but the program checks for the order. So for example, if you are using the numbers {1, 2, 3, 4} for your four trees in the data set and {0, 1, 2, 3} in the equation file (which is the standard for model files anyway) they will be mapped through the order. The new data, that can be used, will have a zero instead of a one, a one instead of a two, and so on.
One could use labels in the equation file even though you are using numbers in the data file, but it is strongly advised against it. It will be assumed, that the order in which you specify the equations matches the order of the numbers in the data file starting from the lowest to the highest. So if you are using again {1, 2, 3, 4} for the four trees, then the first tree you need to define in the equation file (no matter what label you use) must correspond to tree 1 in the data and so on.
With the function sim_ertmpt_data()
(so far no sim_drtmpt_data()
is available) it is possible to generate data from an exponential RT-MPT model. Required is a model object generated with the function to_ertmpt_model()
, a random seed number, the number of subjects, the number of trials per tree in the model, and a named list of parameters. Here it is also possible to set the mean of mean_of_mu_alpha
; A value of zero refers to the probability of 0.5
. If a given probability is desired, say 0.4
, then one might specify params <- list(mean_of_mu_alpha = qnorm(.4))
. If a mean process time of 200
ms is required, one might specify params <- list(mean_of_exp_mu_beta = 1000/200)
. The mean parameter of the evidence plus motor times is easier to transform, since they are already on the seconds scale; Just write 500/1000
if your mean encoding plus motor time should be 500
ms. The code below shows a more or less good balance between too large variances (leading to too many extreme values like probabilities of zero and one and RTs of zero) and too small variances (unrealistic data).
Important: If no variances are specified in the params
argument, then the corresponding mean values are fixed, else they will be randomly drawn.
# randomly drawn group-level mean values mdl_2HTM <- " # targets do+(1-do)*g ; 0 (1-do)*(1-g) ; 1 # lures (1-dn)*g ; 0 dn+(1-dn)*(1-g) ; 1 # do: detect old; dn: detect new; g: guess " model <- to_ertmpt_model(mdl_file = mdl_2HTM) # random group-level parameters params <- list(mean_of_mu_alpha = 0, var_of_mu_alpha = 1, # delete this line to fix mean_of_mu_alpha to 0 mean_of_exp_mu_beta = 10, var_of_exp_mu_beta = 10, # delete this line to fix mean_of_exp_mu_beta to 10 mean_of_mu_gamma = 0.5, var_of_mu_gamma = 0.0025, # delete this line to fix mean_of_mu_gamma to 0.5 mean_of_omega_sqr = 0.005, var_of_omega_sqr = 0.000025, # delete this line to fix mean_of_omega_sqr to 0.005 df_of_sigma_sqr = 10, sf_of_scale_matrix_SIGMA = 0.1, sf_of_scale_matrix_GAMMA = 0.01, prec_epsilon = 10, add_df_to_invWish = 5) sim_dat <- sim_ertmpt_data(model, seed = 123, n.subj = 40, n.trials = 30, params = params) head(sim_dat$data_frame)
With the function fit_ertmpt_SBC()
it is possible to generate data and fit the model to that data using the same priors / ground truths. The main output of this function will be the rank statistic for each parameter in the model, which is needed for the simulation-based calibration (SBC) method by Talts, Betancourt, Simpson, Vehtari, and Gelman (2018). For example, if this function is repeated, say 2000
times, with different seeds
one might add all the rank statistics to a matrix (2000
rows and number of parameters as columns) and then calculate the pearsons' chi-squared statistic. This allows one to test for how many of the parameters the rank statistics are not uniformly distributed. A value of about 0.05
is expected for a test with an alpha level of 0.05
.
Here is an example of one replication of the SBC procedure:
mdl_2HTM <- " # targets d+(1-d)*g ; 0 (1-d)*(1-g) ; 1 # lures (1-d)*g ; 0 d+(1-d)*(1-g) ; 1 # d: detect; g: guess " model <- to_ertmpt_model(mdl_file = mdl_2HTM) params <- list(mean_of_exp_mu_beta = 10, var_of_exp_mu_beta = 10, mean_of_mu_gamma = 0.5, var_of_mu_gamma = 0.0025, mean_of_omega_sqr = 0.005, var_of_omega_sqr = 0.000025, df_of_sigma_sqr = 10, sf_of_scale_matrix_SIGMA = 0.1, sf_of_scale_matrix_GAMMA = 0.01, prec_epsilon = 10, add_df_to_invWish = 5) SBC_out <- fit_ertmpt_SBC(model, seed = 123, prior_params = params) SBC_out$ranks
With the following code it would essentially be possible to calculate the rank statistics of each parameter and then test it with the pearsons' chi-squared test, but much is not considered here (e.g. the convergence and the effective sample size).
# For 2000 replications ## This takes too long to run and in addition, Rhat should always be ## checked as well as the effective sample size. R = 2000 rank_mat <- data.frame() for (r in 1:R) { SBC_out <- fit_ertmpt_SBC(model, seed = r*123, prior_params = params, n.eff_samples = 99) rank_mat <- rbind(rank_mat, SBC_out$ranks) } ## pearsons' chi-square for testing uniformity x <- apply(rank_mat[1:R,], 2, table) expect <- R/100 # 100 = number of bins/cells (0:99) pearson <- apply(X = x, MARGIN = 2, FUN = function(x) {sum((x-expect)^2/expect)}) z95 <- qchisq(0.95, 99) # 99 = degrees of freedom sum(pearson>z95) / length(pearson)
With the following code a rank matrix is generated from a uniform distribution and then tested for uniformity:
R <- 2000 rand_rankmat <- matrix(data = sample(0:99, R*393, replace=TRUE), nrow = R, ncol = 393) ## pearsons' chi-square for testing uniformity x <- apply(rand_rankmat[1:R,], 2, table) expect <- R/100 # 100 = number of bins/cells (0:99) pearson <- apply(X = x, MARGIN = 2, FUN = function(x) {sum((x-expect)^2/expect)}) z95 <- qchisq(0.95, 99) # 99 = degrees of freedom sum(pearson>z95) / length(pearson)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.