calcMSCrit: Calculates Model Selection Criteria For Several (Independent)...
In bayesMCClust: Mixtures-of-Experts Markov Chain Clustering and Dirichlet Multinomial Clustering

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Calculates and plots a set of model selection criteria (depending on the underlying model: e.g. BIC, adjusted BIC, DIC – Deviance Information Criterion, AWE – Approximate Weight of Evidence, CLC – Classification Likelihood Criteria, ICL – Integrated Classification Likelihood, ICL-BIC) for all estimated models produced by one and the same cluster method (for the sake of comparability) and for various numbers H of clusters/groups and several independent MCMC runs saved in output files located in the specified directory. Therefore several maximisation methods are available. For more information about the criteria see Details, References and references therein.

calcMSCritMCC(workDir, myLabel = "model choice for ...", H0 = 3, 
          whatToDoList = c("approxMCL", "approxML", "postMode"))
calcMSCritMCCExt(workDir, NN, myLabel = "model choice for ...", 
          ISdraws = 3, H0 = 3, 
          whatToDoList = c("approxMCL", "approxML", "postMode"))
calcMSCritDMC(workDir, myLabel = "model choice for ...", 
          myN0 = "N0 = ...", 
          whatToDoList = c("approxMCL", "approxML", "postMode"))
calcMSCritDMCExt(workDir, myLabel = "model choice for ...", 
          myN0 = "N0 = ...", 
          whatToDoList = c("approxMCL", "approxML", "postMode"))

`workDir`	A character giving the name (or full path) of the directory containing the output files of the estimated models produced by one and the same cluster method (for the sake of comparability) for which model selection criteria have to be calculated.
`NN`	Number of individuals N (just for argument/parameter checks).
`myLabel`	Specifies (part of) labeling of the plots.
`myN0`	A character documenting the value of `Prior$N0` (has to be equal for all processed models for the sake of comparability!) – just for labeling.
`H0`	Number of 'expected' clusters/groups by user. Necessary for the calculation of the model prior adjusted BIC. See Details.
`ISdraws`	Number of draws for the importance sampling step to approximate the logICL.
`whatToDoList`	A character vector containing a subset of `c("approxMCL", "approxML", "postMode")`. Depending on the entries in this list (`whatToDoList`) the calculation of (all) the criteria is based on the MCMC draws (iteration) corresponding to the maximum of the log classification likelihood (`"approxMCL"`), log likelihood (`"approxML"`) and/or log posterior density (`"postMode"`).

For each maximisation method in whatToDoList all (available) model selection criteria are calculated (in an iterative manner). Depending on the entries in this list (whatToDoList) the calculation of (all) these criteria is based on the MCMC draws (iteration) corresponding to the maximum of the log classification likelihood ("approxMCL"), log likelihood ("approxML") and/or (for the sake of completeness) log posterior density ("postMode").

Note, that the user has to decide which criteria are admissible.

Which criteria needs which maximisation method? The AWE and the logICL are based on the maximum of the (log) classification likelihood, all the others on the maximum of the (log) likelihood (see References).

By the way, it internally calculates the log-likelihood and related values such as LK (observed log-likelihood), CLK (classification or complete log-likelihood), CK (classification-type log-likelihood), EK (entropy term) as well as d_h (number of parameters) which are essential parts of the model selection criteria.

We calculate the model prior adjusted BIC using adjBIC = BIC - 2*H*log(H0) + 2*logΓ(H + 1) + 2*H0.

According to the used model type the following criteria are calculated: Bic, adjusted Bic, Aic, Awe, IclBic, Clc, Dic2, Dic4 and logICL (see References). Furthermore, plots and tables of selected critera are generated (and plots are also saved in directory workDir).

To document the iteration progress, some information is recorded for each output file (containing an MCMC run) – depending on maximisation method – like: a running number, maximisation method, number of cluster/groups, BIC, adjusted BIC, AIC, AWE, CLC, IclBic, DIC2, DIC4a, ICL and additionally adj Rand (which compares the starting with the final allocation).

For each entry in whatToDo a matrix MSCritTable is produced. Each row represents a processed output file (containing an MCMC run) and the colums contain:

H: number of clusters/groups
mMax: number/position of the MCMC draw/iteration leading to the maximum value of the (log-)posterior density or (classification) log-likelihood (depending on whatToDo) which is calculated for each MCMC draw
maxLPD: the maximum value of the (log-)posterior density itself, only if whatToDo includes "postMode" – corresponding to the posterior mode
maxLL: the maximum value of the log-likelihood itself, only if whatToDo includes "approxML" – corresponding to the 'approximate maximum likelihood'
maxLCL: the maximum value of the classification log-likelihood itself, only if whatToDo includes "approxMCL" – corresponding to the 'approximate maximum classification likelihood'
BIC: Bayesian Information Criterion (Schwarz Criterion)
adjBIC: adjusted BIC – Note: not available/implemented for DMC[Ext]!
AIC: Akaike Information Criterion
AWE: Approximate Weight of Evidence, see Banfield and Raftery (1993)
CLC: Classification Likelihood Criterion
IclBic: Integrated Classification Likelihood-BIC
DIC2: Deviance Information Criterion (DIC2), see Fruehwirth-Schnatter and Pyne (2010) and Fruehwirth-Schnatter et al. (2011) – Note: not available/implemented for DMC!
DIC4a: Deviance Information Criterion (DIC4a), see Fruehwirth-Schnatter and Pyne (2010) and Fruehwirth-Schnatter et al. (2011) – Note: not available/implemented for DMC!
logICL: log Integrated Classification Likelihood – Note: not available/implemented for DMC[Ext]!
adjRand: adjusted Rand-Index for (estimated) group membership VS starting values Initial$S.i.start (only if not NULL)

For each entry in whatToDo the corresponding MSCritTable is printed together with the current working directory and the content of the current whatToDo. Further, plots of the model selection criteria are produced and saved (with type eps and pdf).

If MCCExt is considered also the number of importance sampling draws ISdraws (necessary for logICL) is printed.

Additionally, after each iteration the workspace containing the model selection criteria and other stuff is saved to a .RData-file via save.image within directory workDir.

Finally, a list containing the names of the processed output files (each containing an MCMC run) is printed.

A list containing:

`postMode`	the corresponding `MSCritTable` (see Details), only if `whatToDo` includes `"postMode"`
`approxML`	the corresponding `MSCritTable` (see Details), only if `whatToDo` includes `"approxML"`
`approxMCL`	the corresponding `MSCritTable` (see Details), only if `whatToDo` includes `"approxMCL"`
`ISdraws`	the number of importance sampling draws for approximating logICL (only for MCCExt)
`outFileNames`	a list (character vector) containing the names of the processed output files (each containing an MCMC run)

Note, that the user has to decide which criteria are admissible.

Note, that in contrast to the literature (see References), the numbering (labelling) of the states of the categorical outcome variable (time series) in this package is sometimes 0,...,K (instead of 1,...,K), however, there are K+1 categories (states)!

Christoph Pamminger <christoph.pamminger@gmail.com>

Jeffrey D. Banfield and Adrian E. Raftery, (1993), "Model-Based Gaussian and Non-Gaussian Clustering". Biometrics, Vol. 49, No. 3, pp. 803-821. http://www.jstor.org/stable/2532201

Sylvia Fruehwirth-Schnatter, Christoph Pamminger, Andrea Weber and Rudolf Winter-Ebmer, (2011), "Labor market entry and earnings dynamics: Bayesian inference using mixtures-of-experts Markov chain clustering". Journal of Applied Econometrics. DOI: 10.1002/jae.1249 http://onlinelibrary.wiley.com/doi/10.1002/jae.1249/abstract

Sylvia Fruehwirth-Schnatter and Saumyadipta Pyne, (2010), "Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-t distributions". Biostatistics, Vol. 11, No. 2, pp. 317-336. DOI: 10.1093/biostatistics/kxp062 http://biostatistics.oxfordjournals.org/content/11/2/317.full.pdf+html

Christoph Pamminger and Sylvia Fruehwirth-Schnatter, (2010), "Model-based Clustering of Categorical Time Series". Bayesian Analysis, Vol. 5, No. 2, pp. 345-368. DOI: 10.1214/10-BA606 http://ba.stat.cmu.edu/journal/2010/vol05/issue02/pamminger.pdf

classAgreement, savePlot, mcClust, dmClust, mcClustExtended, dmClustExtended

1 2	# please run the examples in mcClust, dmClust, mcClustExtended, # dmClustExtended

Loading required package: gplots

Attaching package: 'gplots'

The following object is masked from 'package:stats':

    lowess

Loading required package: xtable
Loading required package: mnormt
Loading required package: MASS
Loading required package: bayesm
Loading required package: boa
Loading required package: e1071
Loading required package: gtools

Attaching package: 'gtools'

The following object is masked from 'package:e1071':

    permutations

The following object is masked from 'package:bayesm':

    rdirichlet