Description Usage Arguments Details Value References See Also Examples
rimle
searches for G
approximately Gaussianshaped
clusters with/without noise/outliers. The method's tuning controlling
the noise
level is fixed and is to be provided by the user or will be guessed by
the function in
a rather quick and dirty way (otrimle
performs a more
sophisticated datadriven choice).
1 
data 
A numeric vector, matrix, or data frame of observations. Rows correspond
to observations and columns correspond to variables. Categorical
variables and 
G 
An integer specifying the number of clusters. 
initial 
An integer vector specifying the initial cluster
assignment with 
logicd 
A number 
npr.max 
A number in 
erc 
A number 
iter.max 
An integer value specifying the maximum number of iterations allowed in the ECMalgorithm (see Details). 
tol 
Stopping criterion for the underlying ECMalgorithm. An ECM iteration
stops if two successive improper loglikelihood values are within

The rimle
function computes the RIMLE solution using the
ECMalgorithm proposed in Coretto and Hennig (2017).
There may be datasets for which the function does not provide a
solution based on default arguments. This corresponds to
code=0
and flag=1
or flag=2
in the output (see
Valuesection below). This usually happens when some (or all) of the
following circumstances occur: (i) log(icd)
is too
large; (ii) erc
is too large; (iii) npr.max is too large;
(iv) choice of the initial partition. In these cases it is suggested
to find a suitable interval of icd
values by using the
otrimle
function. The Details section of
otrimle
suggests several actions to take
whenever a code=0
nonsolution occurs.
An earlier approximate version of the algorithm was originally proposed in Coretto and Hennig (2016). Software for the original version of the algorithm can be found in the supplementary materials of Coretto and Hennig (2016).
An S3 object of class 'rimle'
. Output components are as follows:
code 
An integer indicator for the convergence.

flag 
A character string containing one or more flags related to
the EM iteration at the optimal icd.

iter 
Number of iterations performed in the underlying EMalgorithm. 
logicd 
Value of the 
iloglik 
Value of the improper likelihood. 
criterion 
Value of the OTRIMLE criterion. 
npr 
Estimated expected noise proportion. 
cpr 
Vector of estimated expected cluster proportions (notice that 
mean 
A matrix of dimension 
cov 
An array of size 
tau 
A matrix of dimension 
smd 
A matrix of dimension 
cluster 
A vector of integers denoting cluster assignments for each
observation. It's 
size 
A vector of integers with sizes (counts) of each cluster. 
Coretto, P. and C. Hennig (2016). Robust improper maximum likelihood: tuning, computation, and a comparison with other methods for robust Gaussian clustering. Journal of the American Statistical Association, Vol. 111(516), pp. 16481659. doi: 10.1080/01621459.2015.1100996
Coretto, P. and C. Hennig (2017). Consistency, breakdown robustness, and algorithms for robust improper maximum likelihood clustering. arXiv preprint available at arXiv:1309.6895.
plot.rimle
,
InitClust
,
otrimle
,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47  ## Load Swiss banknotes data
data(banknote)
x < banknote[,1]
## 
## EXAMPLE 1:
## Perform RIMLE with default inputs
## 
set.seed(1)
a < rimle(data = x, G = 2)
print(a)
## Plot clustering
plot(a, data = x, what = "clustering")
## PP plot of the clusterwise empirical weighted squared Mahalanobis
## distances against the target distribution pchisq(, df=ncol(data))
plot(a, what = "fit")
plot(a, what = "fit", cluster = 1)
## 
## EXAMPLE 2:
## Compare solutions for different choices of logicd
## 
set.seed(1)
## Case 1: noiseless solution, that is fit a pure Gaussian Mixture Model
b1 < rimle(data = x, G = 2, logicd = Inf)
plot(b1, data=x, what="clustering")
plot(b1, what="fit")
## Case 2: low noise level
b2 < rimle(data = x, G = 2, logicd = 100)
plot(b2, data=x, what="clustering")
plot(b2, what="fit")
## Case 3: medium noise level
b3 < rimle(data = x, G = 2, logicd = 10)
plot(b3, data=x, what="clustering")
plot(b3, what="fit")
## Case 3: large noise level
b3 < rimle(data = x, G = 2, logicd = 5)
plot(b3, data=x, what="clustering")
plot(b3, what="fit")

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.