Compute FIA for MPTs
Description
Rport of the function to compute FIA for MPT models by Wu, Myung, and Batchelder (2010a, 2010b). This function is essentially a copy of the original Matlab code to R (with significant parts moved to C++ and allowing for multicore functionality). Also, the order of input arguments is more Rlike.
Usage
1 2 
Arguments
s 
see Details 
parameters 
see Details 
category 
see Details 
N 
see Details 
ineq0 
see Details 
Sample 
see Details 
multicore 
logical. Should fitting be distributed across several cores? Requires snowfall and initialized cluster. See also below. 
split 

mConst 
A constant which is added in the Monte Carlo integration to avoid numerical underflows and is later subtracted (after appropriate transformation). Should be a power of 2 to avoid unnecessary numerical imprecision. 
Details
The following is the original description by Wu, Myung, & Batchelder (2010a) for their Matlab function. All changes to the original document are in squared brackets []:
This function computes the FIA complexity measure, C_FIA, using a Monte Carlo numerical integration algorithm. When inequality is present, sampling from the restricted parameter space is performed by rejection algorithm.
[...] [see References for References]
The following symbols are used in the body of the function:
S denotes number of parameters.
C denotes the number of categories.
M denotes the number of leaves in the tree.
The first input argument s
is related to the string representation of the BMPT model. It can be obtained by replacing all categories in the string by the capital letter C and all branching probabilities by the lower case letter p.
The second input argument parameters
is a row vector that assigns parameters or constants to the p's in the string s
. Its length should be the same as the number of p's in s
, and its elements correspond to the p's according to their order in s
. Positive integer elements in parameters
assign parameters to the corresponding p's, with the same integer denoting the same parameter. Constants are assigned to the p's using the negation of their values.
The [third] input argument category
is a 1 by M vector assigning categories to the C's in the string ‘s’ in the same way parameters
assigns branching probabilities, except that only positive consecutive integers from 1 to J
, the total number of categories, are allowed.
The [fourth] input argument N
specifies the total sample size.
The [fifth] input argument ineq0
assigns inequality constraints imposed on the parameters. It is a matrix with two columns. Each element denotes a parameter coded in the same way as in parameters
. For each row, the parameter on the left column is constrained to be smaller than that on the right column. The number of rows is determined by the total number of simple inequality constraints of the form theta_1 < theta_2
in the model. [Default is NULL
corresponding to no inequality restrictions.]
The last input argument ‘Sample’ specifies the number of random samples to be drawn in the Monte Carlo algorithm. [Default is 200000.]
[For returned values see Value]
It should be noted that ‘lnconst’ can be computed analytically free of Monte Carlo error on a case by case basis described below. For this reason, the users can calculate C_FIA
[see Wu, Myung & Batchelder, 2010a; Equation 7] by adding (S/2)*ln*(N/(2*pi))
, lnInt
and their handcalculated lnconst
to minimize the Monte Carlo errors. [In our experience this error is rather low and negligible.]
A sequence of inequalities theta_1 < theta_2 < ... < theta_k
reduces the parameter space to its 1/k!
, so in this case lnconst
should be ln * (k!)
. In general, any combination of inequality constraints specifies a union of subsets of the parameter space, each satisfying some sequence of inequalities. For example, the subspace defined by theta_1 < theta_2
and theta_3 < theta_2
is a union of two subspaces, one satisfying theta_1 < theta_3 < theta_2
and the other theta_3 < theta_1 < theta_2
, so the proportion is given by 2 * (1/3!) = 1/3
.
A coding example:
Suppose that for model 1HTM5c of source monitoring [see Wu et al., 2010a] , the sample sizes of source A, source B and new items are 300, 300 and 400, respectively and the inequality constraint of d_1 < d_2
is imposed. In this case, the six input arguments should be specified as follows:
s = 'ppppCpCCppCCCppCpCCppCCCppCCC';
parameters = c(.6,.5,1,2,5,4,5,1,3,5,4,5,4,5); [adapted for R]
ineq0 = matrix(c(2,3), 1,2); [adapted for R]
category = c(1,1,2,1,2,3,5,4,5,4,5,6,7,8,9); [adapted for R]
N = 1000;
Another coding example:
For the pairclustering model in Batchelder and Riefer (1999, Figure 1), suppose in a pairclustering experiment there are 300 pairs of words and 100 singletons, the six input arguments should be specified as follows:
s = 'pppCCppCCpCCpCC';
parameters = c(.75,1,2,3,3,3,3); [adapted for R]
ineq0 = NULL; [adapted for R]
category = c(1,4,2,3,3,4,5,6); [adapted for R]
N = 400;
[For more examples, see Examples]
Since MPTinR version 1.1.3 the Monte Carlo integration is performed in C++ using RcppEigen. With the default arguments, one instance of the C++ workhorse is called. To call multiple instances of the C++ workhorse, you can use the split
argument (which can be useful to replicate results obtained with multicore = TRUE
as described below). Note, that each time before calling the C++ code, the seed is set (the set of random seeds are generated before calling the function for the first time).
Multicore functionality is achieved via snowfall which needs to be loaded and a cluster initialized via sfInit
when setting multicore = TRUE
. When split = NULL
(the default), the Samples
will be evenly distributed on the different cores (using sfClusterSplit
), so that only one instance of the underlying C++ workhorse is called on each core. Setting split
to nonNULL
will produce as many instances (distributed across cores). Note that in order to obtain comparable results (as snowfall uses load balancing), the random seed is set (at each core) before calling each instance of the C++ workhorse. This allows to replicate results obtained via multicore in a nonmulticore environment when seting split
appropriately (and set.seed
beforehands).
Value
[A named vector:]
The first output argument CFIA
gives the FIA complexity value of the model.
The second [and third] output argument CI
gives the Monte Carlo confidence interval of CFIA
. [CI.l
, gives the lower, CI.u
, the upper bound of the interval].
The [fourth] output argument lnInt
gives the log integral term in C_FIA
[see Wu, Myung & Batchelder, 2010a; Equation 7] for models without inequality constraints. When inequality constraints are present, lnInt
does not take into account the change in the normalizing constant in the proposal distribution and must be adjusted with the output argument lnconst
.
The [fifth and sixth] output argument [CI.lnint
] gives the Monte Carlo confidence interval of lnInt
. [.l = lower & .u = upper bound of the CI]
When inequality constraints are present, the [seventh] output argument lnconst
serves as an adjustment of ‘lnInt’. It estimates the logarithm of the proportion of parameter space [0,1]^S that satisfies those inequality constraints, and the log integral term is given by lnInt+lnconst.
The next [two] output argument [CI.lnconst
] give the Monte Carlo confidence interval of ‘lnconst’. [.l = lower & .u = upper bound of the CI]
Note
The R version of the code should now (after moving the code to RcppEigen) be considerably faster than the Matlab version of this code.
Author(s)
The original Matlab code was written by Hao Wu, Jay I. Myung, and William H. Batchelder.
This code was ported to R by Henrik Singmann and David Kellen. RcppEigen was added by Henrik Singmann and Christian Mueller. Multicore functionality was added by Henrik Singmann.
References
Wu, H., Myung, J.I., & Batchelder, W.H. (2010a). Minimum description length model selection of multinomial processing tree models. Psychonomic Bulletin & Review, 17, 275286.
Wu, H., Myung, J.I., & Batchelder, W.H. (2010b). On the minimum description length complexity of multinomial processing trees. Journal of Mathematical Psychology, 54, 291303.
See Also
fit.mpt
for the main function of MPTinR.
get.mpt.fia
for a convenient wrapper of this function.
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17  ## Not run:
# The following example is the code for the first example in Wu, Myung & Batchelder (2010a, pp. 280)
# The result should be something like: CFIA = 12.61... or 12.62..., CI = 12.61...  12.62....
# Executing this command can take a while.
bmpt.fia(s = "ppppCpCCppCCCppCpCCppCCCppCCC",
parameters = c(0.5, 0.5, 3, 2, 5, 1, 5, 4, 2, 5, 1, 5, 1, 5),
category = c(1,1,2,1,2,3,5,4,5,4,5,6,7,8,9),
N = 1000, ineq0 = matrix(c(4,3),1,2))
bmpt.fia(s = "ppppCpCCppCCCppCpCCppCCCppCCC",
parameters = c(0.5, 0.5, 3, 2, 5, 1, 5, 4, 2, 5, 1, 5, 1, 5),
category = c(1,1,2,1,2,3,5,4,5,4,5,6,7,8,9),
N = 1000, ineq0 = matrix(c(4,3),1,2), mConst = 2L^8)
## End(Not run)
