# bmpt.fia: Compute FIA for MPTs In MPTinR: Analyze Multinomial Processing Tree Models

## Description

R-port of the function to compute FIA for MPT models by Wu, Myung, and Batchelder (2010a, 2010b). This function is essentially a copy of the original Matlab code to R (with significant parts moved to C++ and allowing for multicore functionality). Also, the order of input arguments is more R-like.

## Usage

 ```1 2``` ```bmpt.fia(s, parameters, category, N, ineq0 = NULL, Sample = 2e+05, multicore = FALSE, split = NULL, mConst = NULL) ```

## Arguments

 `s` see Details `parameters` see Details `category` see Details `N` see Details `ineq0` see Details `Sample` see Details `multicore` logical. Should fitting be distributed across several cores? Requires snowfall and initialized cluster. See also below. `split` `NULL` (the default) or integer specifying in how many separate calls to the C++ workhorse the integrant should be calculated. See below. `mConst` A constant which is added in the Monte Carlo integration to avoid numerical underflows and is later subtracted (after appropriate transformation). Should be a power of 2 to avoid unnecessary numerical imprecision.

## Details

The following is the original description by Wu, Myung, & Batchelder (2010a) for their Matlab function. All changes to the original document are in squared brackets []:

This function computes the FIA complexity measure, C_FIA, using a Monte Carlo numerical integration algorithm. When inequality is present, sampling from the restricted parameter space is performed by rejection algorithm.

[...] [see References for References]

The following symbols are used in the body of the function:
S denotes number of parameters.
C denotes the number of categories.
M denotes the number of leaves in the tree.

The first input argument `s` is related to the string representation of the BMPT model. It can be obtained by replacing all categories in the string by the capital letter C and all branching probabilities by the lower case letter p.

The second input argument `parameters` is a row vector that assigns parameters or constants to the p's in the string `s`. Its length should be the same as the number of p's in `s`, and its elements correspond to the p's according to their order in `s`. Positive integer elements in `parameters` assign parameters to the corresponding p's, with the same integer denoting the same parameter. Constants are assigned to the p's using the negation of their values.

The [third] input argument `category` is a 1 by M vector assigning categories to the C's in the string ‘s’ in the same way `parameters` assigns branching probabilities, except that only positive consecutive integers from 1 to `J`, the total number of categories, are allowed.

The [fourth] input argument `N` specifies the total sample size.

The [fifth] input argument `ineq0` assigns inequality constraints imposed on the parameters. It is a matrix with two columns. Each element denotes a parameter coded in the same way as in `parameters`. For each row, the parameter on the left column is constrained to be smaller than that on the right column. The number of rows is determined by the total number of simple inequality constraints of the form `theta_1 < theta_2` in the model. [Default is `NULL` corresponding to no inequality restrictions.]

The last input argument ‘Sample’ specifies the number of random samples to be drawn in the Monte Carlo algorithm. [Default is 200000.]

[For returned values see Value]

It should be noted that ‘lnconst’ can be computed analytically free of Monte Carlo error on a case by case basis described below. For this reason, the users can calculate `C_FIA` [see Wu, Myung & Batchelder, 2010a; Equation 7] by adding `(S/2)*ln*(N/(2*pi))`, `lnInt` and their hand-calculated `lnconst` to minimize the Monte Carlo errors. [In our experience this error is rather low and negligible.]

A sequence of inequalities `theta_1 < theta_2 < ... < theta_k` reduces the parameter space to its `1/k!`, so in this case `lnconst` should be `-ln * (k!)`. In general, any combination of inequality constraints specifies a union of subsets of the parameter space, each satisfying some sequence of inequalities. For example, the subspace defined by `theta_1 < theta_2` and `theta_3 < theta_2` is a union of two subspaces, one satisfying `theta_1 < theta_3 < theta_2` and the other `theta_3 < theta_1 < theta_2`, so the proportion is given by `2 * (1/3!) = 1/3`.

A coding example:
Suppose that for model 1HTM-5c of source monitoring [see Wu et al., 2010a] , the sample sizes of source A, source B and new items are 300, 300 and 400, respectively and the inequality constraint of `d_1 < d_2` is imposed. In this case, the six input arguments should be specified as follows:
s = 'ppppCpCCppCCCppCpCCppCCCppCCC';
parameters = c(-.6,-.5,1,2,5,4,5,1,3,5,4,5,4,5); [adapted for R]
ineq0 = matrix(c(2,3), 1,2); [adapted for R]
category = c(1,1,2,1,2,3,5,4,5,4,5,6,7,8,9); [adapted for R]
N = 1000;

Another coding example:
For the pair-clustering model in Batchelder and Riefer (1999, Figure 1), suppose in a pair-clustering experiment there are 300 pairs of words and 100 singletons, the six input arguments should be specified as follows:
s = 'pppCCppCCpCCpCC'; parameters = c(-.75,1,2,3,3,3,3); [adapted for R]
ineq0 = NULL; [adapted for R]
category = c(1,4,2,3,3,4,5,6); [adapted for R]
N = 400;

[For more examples, see Examples]

Since MPTinR version 1.1.3 the Monte Carlo integration is performed in C++ using RcppEigen. With the default arguments, one instance of the C++ workhorse is called. To call multiple instances of the C++ workhorse, you can use the `split` argument (which can be useful to replicate results obtained with `multicore = TRUE` as described below). Note, that each time before calling the C++ code, the seed is set (the set of random seeds are generated before calling the function for the first time).

Multicore functionality is achieved via snowfall which needs to be loaded and a cluster initialized via `sfInit` when setting `multicore = TRUE`. When `split = NULL` (the default), the `Samples` will be evenly distributed on the different cores (using `sfClusterSplit`), so that only one instance of the underlying C++ workhorse is called on each core. Setting `split` to non-`NULL` will produce as many instances (distributed across cores). Note that in order to obtain comparable results (as snowfall uses load balancing), the random seed is set (at each core) before calling each instance of the C++ workhorse. This allows to replicate results obtained via multicore in a non-multicore environment when seting `split` appropriately (and `set.seed` beforehands).

## Value

[A named vector:]

The first output argument `CFIA` gives the FIA complexity value of the model.

The second [and third] output argument `CI` gives the Monte Carlo confidence interval of `CFIA`. [`CI.l`, gives the lower, `CI.u`, the upper bound of the interval].

The [fourth] output argument `lnInt` gives the log integral term in `C_FIA` [see Wu, Myung & Batchelder, 2010a; Equation 7] for models without inequality constraints. When inequality constraints are present, `lnInt` does not take into account the change in the normalizing constant in the proposal distribution and must be adjusted with the output argument `lnconst`.

The [fifth and sixth] output argument [`CI.lnint`] gives the Monte Carlo confidence interval of `lnInt`. [.l = lower & .u = upper bound of the CI]

When inequality constraints are present, the [seventh] output argument `lnconst` serves as an adjustment of ‘lnInt’. It estimates the logarithm of the proportion of parameter space [0,1]^S that satisfies those inequality constraints, and the log integral term is given by lnInt+lnconst.

The next [two] output argument [`CI.lnconst`] give the Monte Carlo confidence interval of ‘lnconst’. [.l = lower & .u = upper bound of the CI]

## Note

The R version of the code should now (after moving the code to RcppEigen) be considerably faster than the Matlab version of this code.

## Author(s)

The original Matlab code was written by Hao Wu, Jay I. Myung, and William H. Batchelder.
This code was ported to R by Henrik Singmann and David Kellen. RcppEigen was added by Henrik Singmann and Christian Mueller. Multicore functionality was added by Henrik Singmann.

## References

Wu, H., Myung, J.I., & Batchelder, W.H. (2010a). Minimum description length model selection of multinomial processing tree models. Psychonomic Bulletin & Review, 17, 275-286.

Wu, H., Myung, J.I., & Batchelder, W.H. (2010b). On the minimum description length complexity of multinomial processing trees. Journal of Mathematical Psychology, 54, 291-303.

## See Also

`fit.mpt` for the main function of MPTinR.
`get.mpt.fia` for a convenient wrapper of this function.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17``` ```## Not run: # The following example is the code for the first example in Wu, Myung & Batchelder (2010a, pp. 280) # The result should be something like: CFIA = 12.61... or 12.62..., CI = 12.61... - 12.62.... # Executing this command can take a while. bmpt.fia(s = "ppppCpCCppCCCppCpCCppCCCppCCC", parameters = c(-0.5, -0.5, 3, 2, 5, 1, 5, 4, 2, 5, 1, 5, 1, 5), category = c(1,1,2,1,2,3,5,4,5,4,5,6,7,8,9), N = 1000, ineq0 = matrix(c(4,3),1,2)) bmpt.fia(s = "ppppCpCCppCCCppCpCCppCCCppCCC", parameters = c(-0.5, -0.5, 3, 2, 5, 1, 5, 4, 2, 5, 1, 5, 1, 5), category = c(1,1,2,1,2,3,5,4,5,4,5,6,7,8,9), N = 1000, ineq0 = matrix(c(4,3),1,2), mConst = 2L^8) ## End(Not run) ```

MPTinR documentation built on May 31, 2017, 3:56 a.m.