Description Usage Arguments Details Value Author(s) References See Also Examples
Carries out model-based clustering or model-based classification using some or all of the 14 parsimonious mixtures of contaminated Gaussian Distributions by using the ECM algorithm. Likelihood-based model-selection criteria are used to select the best model and the number of mixture components.
1 2 3 4 5 |
X |
A matrix or data frame such that rows correspond to observations and columns correspond to variables. Note that this function currently only works with multivariate data (p > 1). |
k |
a vector containing the numbers of groups to be tried. |
model |
vector indicating the models (i.e., the covariance structures: "EII", "VII", "EEI", "VEI", "EVI", "VVI", "EEE", "VEE", "EVE", "EEV", "VVE", "VEV", "EVV", "VVV") to be used.
If |
initialization |
initialization strategy for the ECM-algorithm. It can be:
Default value is |
alphacon |
if |
alphamin |
when |
alphafix |
when |
alpha |
vector of proportions of good observations in each group to be considered when |
etacon |
if |
etafix |
if |
eta |
vector of contaminated parameters to be considered when |
etamax |
maximum value for the contamination parameters to be considered in the estimation phase when |
start.z |
matrix of soft or hard classification; it is used only if |
start.v |
3D array of soft or hard classification to the good and bad groups in each mixture components.
It is used as initialization when |
start |
when |
ind.label |
vector of positions (rows) of the labeled observations. |
label |
vector, of the same dimension as |
iter.max |
maximum number of iterations in the ECM-algorithm. Default value is 1000. |
threshold |
threshold for Aitken's acceleration procedure. Default value is 1.0e-03. |
The multivariate data contained in X
are either clustered or classified using parsimonious mixtures of contaminated Gaussian densities with some or all of the 14 parsimonious covariance structures described in Punzo & McNicholas (2013).
The algorithms given by Browne & McNicholas (2013) are considered (see also Celeux & Govaert, 1995, for all the models apart from "EVE" and "VVE").
Starting values are very important to the successful operation of these algorithms and so care must be taken in the interpretation of results.
An object of class pmcgd
is a list with components:
call |
an object of class |
best |
a data frame with the best number of mixture components (first column) and the best model (second column) with respect to the three model selection criteria adopted (AIC, BIC, and ICL) |
bestAIC,bestBIC,bestICL |
for the best AIC, BIC, and ICL models, these are three lists (of the same type) with components:
|
Punzo A. and McNicholas P. D.
Punzo, A., and McNicholas, P. D. (2013). Outlier Detection via Parsimonious Mixtures of Contaminated Gaussian Distributions. arXiv.org e-print 1305.4669, available at: http://arxiv.org/abs/1305.4669.
Browne, R. P. and McNicholas, P. D. (2013). mixture: Mixture Models for Clustering and Classification. R package version 1.0.
Celeux, G. and Govaert, G. (1995). Gaussian Parsimonious Clustering Models. Pattern Recognition. 28(5), 781-793.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | # Artificial data from an EEI model with k=2 components
library(mnormt)
p <- 2
k <- 2
eta <- c(8,8) # contamination parameters
set.seed(12345)
X1good <- rmnorm(n = 300, mean = rep(3,p), varcov = diag(c(5,0.5)))
X2good <- rmnorm(n = 300, mean = rep(-3,p), varcov = diag(c(5,0.5)))
X1bad <- rmnorm(n = 30, mean = rep(3,p), varcov = eta[1]*diag(c(5,0.5)))
X2bad <- rmnorm(n = 30, mean = rep(-3,p), varcov = eta[2]*diag(c(5,0.5)))
X <- rbind(X1good,X1bad,X2good,X2bad)
plot(X, pch = 16, cex = 0.8)
# model-based clustering with the whole family of 14
# parsimonious models and number of groups ranging from 1 to 3
overallfit <- MS(X, k = 1:2, model = c("EEI","VVV"), initialization = "mclust")
# to see the best BIC results
bestBIC <- overallfit$bestBIC
# plot of the best BIC model
plot(X, xlab = expression(X[1]), ylab = expression(X[2]), col = "white")
text(X, labels = bestBIC$detection$innergroup, col = bestBIC$group, cex = 0.7, asp = 1)
box(col = "black")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.