rimle: Robust Improper Maximum Likelihood Clustering
In otrimle: Robust Model-Based Clustering

Description Usage Arguments Details Value Author(s) References See Also Examples

rimle searches for G approximately Gaussian-shaped clusters with/without noise/outliers. The method's tuning controlling the noise level is fixed and is to be provided by the user or will be guessed by the function in a rather quick and dirty way (otrimle performs a more sophisticated data-driven choice).

1	rimle(data, G, initial=NULL, logicd=NULL, npr.max=0.5, erc=20, iter.max=500, tol=1e-6)

`data`	A numeric vector, matrix, or data frame of observations. Rows correspond to observations and columns correspond to variables. Categorical variables and `NA` values are not allowed.
`G`	An integer specifying the number of clusters.
`initial`	An integer vector specifying the initial cluster assignment with `0` denoting noise/outliers. If `NULL` (default) initialization is performed using `InitClust`.
`logicd`	A number `log(icd)`, where `0 <= icd < Inf` is the value of the improper constant density (`icd`). This is the RIMLE's tuning for controlling the size of the noise. If `logicd=NULL` (default), an `icd` value is guessed based on the data. A pure Gaussian Mixture Model fit is obtained with `logicd = -Inf`.
`npr.max`	A number in `[0,1)` specifying the maximum proportion of noise/outliers. This defines the noise proportion constraint. If `npr.max=0` a solution without noise component is computed (corresponding to `logicd = -Inf`.
`erc`	A number `>=1` specifying the maximum allowed ratio between within-cluster covariance matrix eigenvalues. This defines the eigenratio constraint. `erc=1` enforces spherical clusters with equal covariance matrices. A large `erc` allows for large between-cluster covariance discrepancies. In order to facilitate the setting of `erc`, it is suggested to scale the columns of `data` (see `scale`) whenever measurement units of the different variables are grossly incompatible.
`iter.max`	An integer value specifying the maximum number of iterations allowed in the ECM-algorithm (see Details).
`tol`	Stopping criterion for the underlying ECM-algorithm. An ECM iteration stops if two successive improper log-likelihood values are within `tol`.

The rimle function computes the RIMLE solution using the ECM-algorithm proposed in Coretto and Hennig (2017).

There may be datasets for which the function does not provide a solution based on default arguments. This corresponds to code=0 and flag=1 or flag=2 in the output (see Value-section below). This usually happens when some (or all) of the following circumstances occur: (i) log(icd) is too large; (ii) erc is too large; (iii) npr.max is too large; (iv) choice of the initial partition. In these cases it is suggested to find a suitable interval of icd values by using the otrimle function. The Details section of otrimle suggests several actions to take whenever a code=0 non-solution occurs.

The pi object returned by the rimle function (see Value) corresponds to the vector of pi parameters in the underlying pseudo-model (1) defined in Coretto and Hennig (2017). With logicd = -Inf the rimle function approximates the MLE for the plain Gaussian mixture model with eigenratio covariance regularization, in this case the the first element of the pi vector is set to zero because the noise component is not considered. In general, for iid sampling from finite mixture models context, these pi parameters define expected clusters' proportions. Because of the noise proportion constraint in the RIMLE, there are situations where this connection may not happen in practice. The latter is likely to happen when both logicd and npr.max are large. Therefore, estimated expected clusters' proportions are reported in the exproportion object of the rimle output, and these are computed based on the improper posterior probabilities given in tau. See Coretto and Hennig (2017) for more discussion on this.

An earlier approximate version of the algorithm was originally proposed in Coretto and Hennig (2016). Software for the original version of the algorithm can be found in the supplementary materials of Coretto and Hennig (2016).

An S3 object of class 'rimle'. Output components are as follows:

`code`	An integer indicator for the convergence. `code=0` if no solution is found (see Details); `code=1` if the EM-algorithm did not converge within `em.iter.max`; `code=2` convergence is fully achieved.
`flag`	A character string containing one or more flags related to the EM iteration at the optimal icd. `flag=1` if it was not possible to prevent the numerical degeneracy of improper posterior probabilities (`tau` value below). `flag=2` if enforcement of the noise proportion constraint failed for numerical reasons. `flag=3` if enforcement of the eigenratio constraint failed for numerical reasons. `flag=4` if the noise proportion constraint has been successfully applied at least once. `flag=5` if the eigenratio constraint has been successfully applied at least once.
`iter`	Number of iterations performed in the underlying EM-algorithm.
`logicd`	Value of the `log(icd)`.
`iloglik`	Value of the improper likelihood.
`criterion`	Value of the OTRIMLE criterion.
`pi`	Estimated vector of the `pi` parameters of the underlying pseudo-model (see Details).
`mean`	A matrix of dimension `ncol(data) x G` containing the mean parameters of each cluster (column-wise).
`cov`	An array of size `ncol(data) x ncol(data) x G` containing the covariance matrices of each cluster.
`tau`	A matrix of dimension `nrow(data) x {1+G}` where `tau[i, 1+j]` is the estimated (improper) posterior probability that the ith observation belongs to the jth cluster. `tau[i,1]` is the estimated (improper) posterior probability that ith observation belongs to the noise component.
`smd`	A matrix of dimension `nrow(data) x G` where `smd[i,j]` is the squared Mahalanobis distance of `data[i,]` from `mean[,j]` according to `cov[,,j]`.
`cluster`	A vector of integers denoting cluster assignments for each observation. It's `0` for observations assigned to noise/outliers.
`size`	A vector of integers with sizes (counts) of each cluster.
`exproportion`	A vector of estimated expected clusters' proportions (see Details).

Pietro Coretto pcoretto@unisa.it https://pietro-coretto.github.io

Coretto, P. and C. Hennig (2016). Robust improper maximum likelihood: tuning, computation, and a comparison with other methods for robust Gaussian clustering. Journal of the American Statistical Association, Vol. 111(516), pp. 1648-1659. doi: 10.1080/01621459.2015.1100996

P. Coretto and C. Hennig (2017). Consistency, breakdown robustness, and algorithms for robust improper maximum likelihood clustering. Journal of Machine Learning Research, Vol. 18(142), pp. 1-39. https://jmlr.org/papers/v18/16-382.html

plot.rimle, InitClust, otrimle,

## Load  Swiss banknotes data
data(banknote)
x <- banknote[,-1]

## -----------------------------------------------------------------------------
## EXAMPLE 1:
## Perform RIMLE with default inputs
## -----------------------------------------------------------------------------
set.seed(1)
a <- rimle(data = x, G = 2)
print(a)

## Plot clustering
plot(a, data = x, what = "clustering")

## P-P plot of the clusterwise empirical weighted squared Mahalanobis
## distances against the target distribution pchisq(, df=ncol(data))
plot(a, what = "fit")
plot(a, what = "fit", cluster = 1)



## -----------------------------------------------------------------------------
## EXAMPLE 2:
## Compare solutions for different choices of logicd
## -----------------------------------------------------------------------------
set.seed(1)

## Case 1: noiseless solution, that is fit a pure Gaussian Mixture Model
b1 <- rimle(data = x, G = 2, logicd = -Inf)
plot(b1, data=x, what="clustering")
plot(b1, what="fit")

## Case 2: low noise level
b2 <- rimle(data = x, G = 2, logicd = -100)
plot(b2, data=x, what="clustering")
plot(b2, what="fit")

## Case 3: medium noise level
b3 <- rimle(data = x, G = 2, logicd = -10)
plot(b3, data=x, what="clustering")
plot(b3, what="fit")

## Case 3: large noise level
b3 <- rimle(data = x, G = 2, logicd = 5)
plot(b3, data=x, what="clustering")
plot(b3, what="fit")