Likelihood-Based Mixture Model Statistics

Description

See Details for a description of the individual functions.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
## S3 method for class 'mixmod'
logLik(object, ...)

## S3 method for class 'mixmod'
qval(x, map=FALSE, ...)
## S3 method for class 'mixmod'
qfun(x, map=FALSE, ...)
## S3 method for class 'mdmixmod'
qfun(x, map=FALSE, ...)

## S3 method for class 'mixmod'
aic(x, ...)
## S3 method for class 'mixmod'
bic(x, ...)
## S3 method for class 'mixmod'
entropy(x, map=FALSE, ...)
## S3 method for class 'mixmod'
iclbic(x, map=FALSE, ...)
## S3 method for class 'mdmixmod'
siclbic(x, map=FALSE, ...)

Arguments

x, object

an object of class mixmod or mdmixmod.

map

logical; if TRUE, the maximum a posteriori (MAP) estimates rather than the posterior probabilities will be used when estimating expectations with respect to hidden data.

...

currently unused.

Details

logLik calculates L(theta|X), the log-likelihood of the estimated parameters theta with respect to the observed data X, while qval calculates the “Q-value”, the expectation with respect to the hidden data of the log-likelihood with respect to the complete data: Q(theta) = E[L(theta|X,Y)] for mixmod and Q(theta) = E[L(theta|X,Y,Y0)] for mdmixmod. qfun returns the hidden and observed portions of the Q-value separately, as elements of a vector.

aic, bic, entropy, iclbic, and siclbic calculate various information criteria for model selection with mixture models of class mixmod and mdmixmod. These criteria are Akaike's information criterion (AIC, Akaike, 1974), the Bayes information criterion (BIC, Schwarz, 1978), the classification entropy (Biernacki et al., 2000), the integrated complete likelihood BIC (ICL-BIC, Biernacki et al., 2000), and the simplified ICL-BIC (SICL-BIC) for objects of class mdmixmod, respectively. They are defined as follows:

AIC = 2 L(theta|X) - 2 |Theta|
BIC = 2 L(theta|X) - |Theta| log(N)
entropy = 2 L(theta|X) - 2 Q(theta)
ICL-BIC = 2 Q(theta) - |Theta| log(N)
SICL-BIC = 2 E[L(theta|X,Y0)] - |Theta| log(N) (mdmixmod only)

where |Theta| is the size of the parameter space and N is the size of the data. Generally, the model which provides the highest value of any information criterion should be selected. Current testing indicates that ICL-BIC is preferred for mixmod and BIC for mdmixmod.

Value

A numeric vector for qfun, a numeric scalar for the other functions.

Note

Some authors define AIC, BIC, and ICL-BIC as the negative of the quantities given in Details.

Author(s)

Daniel Dvorkin

References

Akaike, H. (1974) A new look at the statistical model identification, IEEE Transactions on Automatic Control 19(6), 716–723.

Biernacki, C. and Celeux, G. and Govaert, G. (2000) Assessing a mixture model for clustering with the integrated completed likelihood, IEEE Transactions on Pattern Analysis and Machine Intelligence 22(7), 719–725.

McLachlan, G.J. and Thriyambakam, K. (2008) The EM Algorithm and Extensions, John Wiley & Sons.

Schwarz, G. (1978) Estimating the dimension of a model, The Annals of Statistics 6(2), 461–464.

See Also

mixmod and mdmixmod for details of the hidden variable structure.

Examples

1
2
3
4
5
6
7
8
## Not run: 
data(CiData)
fit <- mixmod(CiData$expression, 3)
bic(fit)            # -95405.4
qval(fit)           # -50055.35
qval(fit, map=TRUE) # -49738.53

## End(Not run)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.