Description Usage Arguments Details Value Author(s) References See Also Examples
rimle
searches for G
approximately Gaussianshaped
clusters with/without noise/outliers. The method's tuning controlling
the noise level is fixed and is to be provided by the user or will be guessed by
the function in a rather quick and dirty way (otrimle
performs a more sophisticated datadriven choice).
1 
data 
A numeric vector, matrix, or data frame of observations. Rows correspond
to observations and columns correspond to variables. Categorical
variables and 
G 
An integer specifying the number of clusters. 
initial 
An integer vector specifying the initial cluster
assignment with 
logicd 
A number 
npr.max 
A number in 
erc 
A number 
iter.max 
An integer value specifying the maximum number of iterations allowed in the ECMalgorithm (see Details). 
tol 
Stopping criterion for the underlying ECMalgorithm. An ECM iteration
stops if two successive improper loglikelihood values are within

The rimle
function computes the RIMLE solution using the
ECMalgorithm proposed in Coretto and Hennig (2017).
There may be datasets for which the function does not provide a
solution based on default arguments. This corresponds to
code=0
and flag=1
or flag=2
in the output (see
Valuesection below). This usually happens when some (or all) of the
following circumstances occur: (i) log(icd)
is too
large; (ii) erc
is too large; (iii) npr.max is too large;
(iv) choice of the initial partition. In these cases it is suggested
to find a suitable interval of icd
values by using the
otrimle
function. The Details section of
otrimle
suggests several actions to take
whenever a code=0
nonsolution occurs.
The pi
object returned by the rimle
function (see
Value) corresponds to the vector of pi
parameters in
the underlying pseudomodel (1) defined in Coretto and Hennig (2017).
With logicd = Inf
the rimle
function approximates the
MLE for the plain Gaussian mixture model with eigenratio
covariance regularization, in this case the the first element of the
pi
vector is set to zero because the noise component is not
considered. In general, for iid sampling from finite mixture models
context, these pi parameters define expected clusters'
proportions. Because of the noise proportion constraint in the RIMLE,
there are situations where this connection may not happen in
practice. The latter is likely to happen when both logicd
and
npr.max
are large. Therefore, estimated expected clusters'
proportions are reported in the exproportion
object of the
rimle
output, and these are computed based on the
improper posterior probabilities given in tau
.
See Coretto and Hennig (2017) for more discussion on this.
An earlier approximate version of the algorithm was originally proposed in Coretto and Hennig (2016). Software for the original version of the algorithm can be found in the supplementary materials of Coretto and Hennig (2016).
An S3 object of class 'rimle'
. Output components are as follows:
code 
An integer indicator for the convergence.

flag 
A character string containing one or more flags related to
the EM iteration at the optimal icd.

iter 
Number of iterations performed in the underlying EMalgorithm. 
logicd 
Value of the 
iloglik 
Value of the improper likelihood. 
criterion 
Value of the OTRIMLE criterion. 
pi 
Estimated vector of the 
mean 
A matrix of dimension 
cov 
An array of size 
tau 
A matrix of dimension 
smd 
A matrix of dimension 
cluster 
A vector of integers denoting cluster assignments for each
observation. It's 
size 
A vector of integers with sizes (counts) of each cluster. 
exproportion 
A vector of estimated expected clusters' proportions (see Details). 
Pietro Coretto pcoretto@unisa.it https://pietrocoretto.github.io
Coretto, P. and C. Hennig (2016). Robust improper maximum likelihood: tuning, computation, and a comparison with other methods for robust Gaussian clustering. Journal of the American Statistical Association, Vol. 111(516), pp. 16481659. doi: 10.1080/01621459.2015.1100996
P. Coretto and C. Hennig (2017). Consistency, breakdown robustness, and algorithms for robust improper maximum likelihood clustering. Journal of Machine Learning Research, Vol. 18(142), pp. 139. https://jmlr.org/papers/v18/16382.html
plot.rimle
,
InitClust
,
otrimle
,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47  ## Load Swiss banknotes data
data(banknote)
x < banknote[,1]
## 
## EXAMPLE 1:
## Perform RIMLE with default inputs
## 
set.seed(1)
a < rimle(data = x, G = 2)
print(a)
## Plot clustering
plot(a, data = x, what = "clustering")
## PP plot of the clusterwise empirical weighted squared Mahalanobis
## distances against the target distribution pchisq(, df=ncol(data))
plot(a, what = "fit")
plot(a, what = "fit", cluster = 1)
## 
## EXAMPLE 2:
## Compare solutions for different choices of logicd
## 
set.seed(1)
## Case 1: noiseless solution, that is fit a pure Gaussian Mixture Model
b1 < rimle(data = x, G = 2, logicd = Inf)
plot(b1, data=x, what="clustering")
plot(b1, what="fit")
## Case 2: low noise level
b2 < rimle(data = x, G = 2, logicd = 100)
plot(b2, data=x, what="clustering")
plot(b2, what="fit")
## Case 3: medium noise level
b3 < rimle(data = x, G = 2, logicd = 10)
plot(b3, data=x, what="clustering")
plot(b3, what="fit")
## Case 3: large noise level
b3 < rimle(data = x, G = 2, logicd = 5)
plot(b3, data=x, what="clustering")
plot(b3, what="fit")

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.