# Calculates Model Selection Criteria For Several (Independent) MCMC Runs And Various Numbers H of Clusters

### Description

Calculates and plots a set of model selection criteria (depending on the underlying model: e.g. BIC, adjusted BIC,
DIC – Deviance Information Criterion, AWE – Approximate Weight of Evidence, CLC – Classification Likelihood
Criteria, ICL – Integrated Classification Likelihood, ICL-BIC) for all estimated models produced by one and the
same cluster method (for the sake of comparability) and for various numbers *H* of clusters/groups and several
independent MCMC runs saved in output files located in the specified directory. Therefore several maximisation
methods are available. For more information about the criteria see **Details**, **References** and references
therein.

### Usage

1 2 3 4 5 6 7 8 9 10 11 | ```
calcMSCritMCC(workDir, myLabel = "model choice for ...", H0 = 3,
whatToDoList = c("approxMCL", "approxML", "postMode"))
calcMSCritMCCExt(workDir, NN, myLabel = "model choice for ...",
ISdraws = 3, H0 = 3,
whatToDoList = c("approxMCL", "approxML", "postMode"))
calcMSCritDMC(workDir, myLabel = "model choice for ...",
myN0 = "N0 = ...",
whatToDoList = c("approxMCL", "approxML", "postMode"))
calcMSCritDMCExt(workDir, myLabel = "model choice for ...",
myN0 = "N0 = ...",
whatToDoList = c("approxMCL", "approxML", "postMode"))
``` |

### Arguments

`workDir` |
A character giving the name (or full path) of the directory containing the output files of the estimated models produced by one and the same cluster method (for the sake of comparability) for which model selection criteria have to be calculated. |

`NN` |
Number of individuals |

`myLabel` |
Specifies (part of) labeling of the plots. |

`myN0` |
A character documenting the value of |

`H0` |
Number of 'expected' clusters/groups by user. Necessary for the calculation of the model prior |

`ISdraws` |
Number of draws for the importance sampling step to approximate the logICL. |

`whatToDoList` |
A character vector containing a subset of |

### Details

For each maximisation method in `whatToDoList`

all (available) model selection criteria are calculated (in an
iterative manner). Depending on the entries in this list (`whatToDoList`

) the calculation of (all) these
criteria is based on the MCMC draws (iteration) corresponding to the maximum of the log classification likelihood
(`"approxMCL"`

), log likelihood (`"approxML"`

) and/or (for the sake of completeness) log posterior density
(`"postMode"`

).

Note, that the user has to decide which criteria are admissible.

Which criteria needs which maximisation method? The AWE and the logICL are based on the maximum of the (log)
classification likelihood, all the others on the maximum of the (log) likelihood (see **References**).

By the way, it internally calculates the log-likelihood and related values such as `LK`

(observed
log-likelihood), `CLK`

(classification or complete log-likelihood), `CK`

(classification-type
log-likelihood), `EK`

(entropy term) as well as *d_h* (number of parameters) which are essential parts of the
model selection criteria.

We calculate the model prior *adjusted BIC* using
*adjBIC = BIC - 2*H*log(H0) + 2*logΓ(H + 1) + 2*H0*.

According to the used model type the following criteria are calculated: Bic, adjusted Bic, Aic, Awe, IclBic, Clc,
Dic2, Dic4 and logICL (see **References**). Furthermore, plots and tables of selected critera are generated (and
plots are also saved in directory `workDir`

).

To document the iteration progress, some information is recorded for each output file (containing an MCMC run) – depending on maximisation method – like: a running number, maximisation method, number of cluster/groups, BIC, adjusted BIC, AIC, AWE, CLC, IclBic, DIC2, DIC4a, ICL and additionally adj Rand (which compares the starting with the final allocation).

For each entry in `whatToDo`

a matrix `MSCritTable`

is produced. Each row represents a processed output
file (containing an MCMC run) and the colums contain:

`H`

number of clusters/groups

`mMax`

number/position of the MCMC draw/iteration leading to the maximum value of the (log-)posterior density or (classification) log-likelihood (depending on

`whatToDo`

) which is calculated for each MCMC draw`maxLPD`

the maximum value of the (log-)posterior density itself, only if

`whatToDo`

includes`"postMode"`

– corresponding to the posterior mode`maxLL`

the maximum value of the log-likelihood itself, only if

`whatToDo`

includes`"approxML"`

– corresponding to the 'approximate maximum likelihood'`maxLCL`

the maximum value of the classification log-likelihood itself, only if

`whatToDo`

includes`"approxMCL"`

– corresponding to the 'approximate maximum classification likelihood'`BIC`

Bayesian Information Criterion (Schwarz Criterion)

`adjBIC`

adjusted BIC – Note: not available/implemented for DMC[Ext]!

`AIC`

Akaike Information Criterion

`AWE`

Approximate Weight of Evidence, see Banfield and Raftery (1993)

`CLC`

Classification Likelihood Criterion

`IclBic`

Integrated Classification Likelihood-BIC

`DIC2`

Deviance Information Criterion (DIC2), see Fruehwirth-Schnatter and Pyne (2010) and Fruehwirth-Schnatter et al. (2011) – Note: not available/implemented for DMC!

`DIC4a`

Deviance Information Criterion (DIC4a), see Fruehwirth-Schnatter and Pyne (2010) and Fruehwirth-Schnatter et al. (2011) – Note: not available/implemented for DMC!

`logICL`

log Integrated Classification Likelihood – Note: not available/implemented for DMC[Ext]!

`adjRand`

adjusted Rand-Index for (estimated) group membership VS starting values

`Initial$S.i.start`

(only if not`NULL`

)

For each entry in `whatToDo`

the corresponding `MSCritTable`

is printed together with the current working
directory and the content of the current `whatToDo`

. Further, plots of the model selection criteria are produced
and saved (with type `eps`

and `pdf`

).

If *MCCExt* is considered also the number of importance sampling draws `ISdraws`

(necessary for logICL) is
printed.

Additionally, after each iteration the workspace containing the model selection criteria and other stuff is saved to
a .RData-file via `save.image`

within directory `workDir`

.

Finally, a list containing the names of the processed output files (each containing an MCMC run) is printed.

### Value

A list containing:

`postMode ` |
the corresponding |

`approxML ` |
the corresponding |

`approxMCL ` |
the corresponding |

`ISdraws ` |
the number of importance sampling draws for approximating logICL (only for |

`outFileNames ` |
a list (character vector) containing the names of the processed output files (each containing an MCMC run) |

### Note

Note, that the user has to decide which criteria are admissible.

Note, that in contrast to the literature (see **References**), the numbering (labelling) of the states of the
categorical outcome variable (time series) in this package is sometimes *0,...,K* (instead of
*1,...,K*), however, there are *K+1* categories (states)!

### Author(s)

Christoph Pamminger <christoph.pamminger@gmail.com>

### References

Jeffrey D. Banfield and Adrian E. Raftery, (1993),
"Model-Based Gaussian and Non-Gaussian Clustering".
*Biometrics*, Vol. 49, No. 3, pp. 803-821.
http://www.jstor.org/stable/2532201

Sylvia Fruehwirth-Schnatter, Christoph Pamminger, Andrea Weber and Rudolf Winter-Ebmer, (2011),
"Labor market entry and earnings dynamics: Bayesian inference using
mixtures-of-experts Markov chain clustering".
*Journal of Applied Econometrics*. DOI: 10.1002/jae.1249
http://onlinelibrary.wiley.com/doi/10.1002/jae.1249/abstract

Sylvia Fruehwirth-Schnatter and Saumyadipta Pyne, (2010),
"Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-t distributions".
*Biostatistics*, Vol. 11, No. 2, pp. 317-336. DOI: 10.1093/biostatistics/kxp062
http://biostatistics.oxfordjournals.org/content/11/2/317.full.pdf+html

Christoph Pamminger and Sylvia Fruehwirth-Schnatter, (2010),
"Model-based Clustering of Categorical Time Series".
*Bayesian Analysis*, Vol. 5, No. 2, pp. 345-368. DOI: 10.1214/10-BA606
http://ba.stat.cmu.edu/journal/2010/vol05/issue02/pamminger.pdf

### See Also

`classAgreement`

, `savePlot`

,
`mcClust`

, `dmClust`

, `mcClustExtended`

, `dmClustExtended`

### Examples

1 2 | ```
# please run the examples in mcClust, dmClust, mcClustExtended,
# dmClustExtended
``` |