otrimle: Optimally Tuned Robust Improper Maximum Likelihood Clustering
In otrimle: Robust Model-Based Clustering

Description Usage Arguments Details Value References Author(s) See Also Examples

otrimle searches for G approximately Gaussian-shaped clusters with/without noise/outliers. The method's tuning controlling the noise level is adaptively chosen based on the data.

1 2	otrimle(data, G, initial = NULL, logicd = NULL, npr.max = 0.5, erc = 20, beta = 0, iter.max = 500, tol = 1e-06, ncores = NULL, monitor = TRUE)

`data`	A numeric vector, matrix, or data frame of observations. Rows correspond to observations and columns correspond to variables. Categorical variables and `NA` values are not allowed.
`G`	An integer specifying the number of clusters.
`initial`	An integer vector specifying the initial cluster assignment with `0` denoting noise/outliers. If `NULL` (default) initialization is performed using `InitClust`.
`logicd`	A vector defining a grid of log(icd) values, where icd denotes the improper constant density. If `logicd=NULL` a default grid is considered. A pure Gaussian Mixture Model fit (obtained when `log(icd)=-Inf`) is included in the default search path.
`npr.max`	A number in `[0,1)` specifying the maximum proportion of noise/outliers. This defines the noise proportion constraint. If `npr.max=0` a single solution without noise component is computed (corresponding to `logicd = -Inf`.
`erc`	A number `>=1` specifying the maximum allowed ratio between within-cluster covariance matrix eigenvalues. This defines the eigenratio constraint. `erc=1` enforces spherical clusters with equal covariance matrices. A large `erc` allows for large between-cluster covariance discrepancies. In order to facilitate the setting of `erc`, it is suggested to scale the columns of `data` (see `scale`) whenever measurement units of the different variables are grossly incompatible.
`beta`	A non-negative constant. This is the beta penalty coefficient introduced in Coretto and Hennig (2016).
`iter.max`	An integer value specifying the maximum number of iterations allowed in the underlying ECM-algorithm.
`tol`	Stopping criterion for the underlying ECM-algorithm. An ECM iteration stops if two successive improper log-likelihood values are within `tol`.
`ncores`	an integer value defining the number of cores used for parallel computing. When `ncores=NULL` (default), the number `r` of available cores is detected, and `(r-1)` of them are used (See Details). If `ncores=1` no parallel backend is started.
`monitor`	logical. If `TRUE` progress messages are printed on screen.

The otrimle function computes the OTRIMLE solution based on the ECM-algorithm (expectation conditional maximization algorithm) proposed in Coretto and Hennig (2017).

The otrimle criterion is minimized over the logicd grid of log(icd) values using parallel computing based on the foreach. Note that, depending on the BLAS/LAPACK setting, increasing ncores may not produce the desired reduction in computing time. The latter happens when optimized linear algebra routines are in use (e.g. OpenBLAS, Intel Math Kernel Library (MKL), etc.). These optimized shared libraries already implement multithreading. Therefore, in this case increasing ncores may only reduce the computing time marginally.

Occasionally, there may be datasets for which the function does not provide a solution based on default arguments. This corresponds to code=0 and flag=1 or flag=2 in the output (see Value-section below). This usually happens when some (or all) of the following circumstances occur: (i) erc is too large; (ii) npr.max is too large; (iii) choice of the initial partition. Regarding (i) and (ii) it is not possible to give numeric references because whether these numbers are too large/small strongly depends on the sample size and the dimensionality of the data. References given below explain the relationship between these quantities.

It is suggested to try the following whenever a code=0 non-solution occurs. Set the logicd range wide enough (e.g. logicd=seq(-500,-5, length=50)), choose erc=1, and a low choice of npr.max (e.g. npr.max=0.02). Monitor the solution with the criterion profiling plot (plot.otrimle). According to the criterion profiling plot change logicd, and increase erc and npr.max up to the point when a "clear" minimum in the criterion profiling plot is obtained. If this strategy does not work it is suggested to experiment with a different initial partitions (see initial above).

TBA: Christian may add something about the beta here.

The pi object returned by the rimle function (see Value) corresponds to the vector of pi parameters in the underlying pseudo-model (1) defined in Coretto and Hennig (2017). With logicd = -Inf the rimle function approximates the MLE for the plain Gaussian mixture model with eigenratio covariance regularization, in this case the the first element of the pi vector is set to zero because the noise component is not considered. In general, for iid sampling from finite mixture models context, these pi parameters define expected clusters' proportions. Because of the noise proportion constraint in the RIMLE, there are situations where this connection may not happen in practice. The latter is likely to happen when both logicd and npr.max are large. Therefore, estimated expected clusters' proportions are reported in the exproportion object of the rimle output, and these are computed based on the improper posterior probabilities given in tau. See Coretto and Hennig (2017) for more discussion on this.

An earlier approximate version of the algorithm was originally proposed in Coretto and Hennig (2016). Software for the original version of the algorithm can be found in the supplementary materials of Coretto and Hennig (2016).

An S3 object of class 'otrimle' providing the optimal (according to the OTRIMLE criterion) clustering. Output components are as follows:

`code`	An integer indicator for the convergence. `code=0` if no solution is found (see Details); `code=1` if at the optimal icd value the corresponding EM-algorithm did not converge within `em.iter.max`; `code=2` convergence is fully achieved.
`flag`	A character string containing one or more flags related to the EM iteration at the optimal icd. `flag=1` if it was not possible to prevent the numerical degeneracy of improper posterior probabilities (`tau` value below). `flag=2` if enforcement of the noise proportion constraint failed for numerical reasons. `flag=3` if the noise proportion constraint has been successfully applied at least once. `flag=4` if the eigenratio constraint has been successfully applied at least once.
`iter`	Number of iterations performed in the underlying EM-algorithm at the optimal `icd`.
`logicd`	Resulting value of the optimal `log(icd)`.
`iloglik`	Resulting value of the improper likelihood.
`criterion`	Resulting value of the OTRIMLE criterion.
`pi`	Estimated vector of the `pi` parameters of the underlying pseudo-model (see Details).
`mean`	A matrix of dimension `ncol(data) x G` containing the mean parameters of each cluster (column-wise).
`cov`	An array of size `ncol(data) x ncol(data) x G` containing the covariance matrices of each cluster.
`tau`	A matrix of dimension `nrow(data) x {1+G}` where `tau[i, 1+j]` is the estimated (improper) posterior probability that the ith observation belongs to the jth cluster. `tau[i,1]` is the estimated (improper) posterior probability that ith observation belongs to the noise component.
`smd`	A matrix of dimension `nrow(data) x G` where `smd[i,j]` is the squared Mahalanobis distance of `data[i,]` from `mean[,j]` according to `cov[,,j]`.
`cluster`	A vector of integers denoting cluster assignments for each observation. It's `0` for observations assigned to noise/outliers.
`size`	A vector of integers with sizes (counts) of each cluster.
`exproportion`	A vector of estimated expected clusters' proportions (see Details).
`optimization`	A data.frame with the OTRIMLE optimization profiling. For each value of `log(icd)` explored by the algorithm the data.frame stores `logicd, criterion, iloglik, code, flag` (defined above), and `enpr` that denotes the expected noise proportion.

Coretto, P. and C. Hennig (2016). Robust improper maximum likelihood: tuning, computation, and a comparison with other methods for robust Gaussian clustering. Journal of the American Statistical Association, Vol. 111(516), pp. 1648-1659. doi: 10.1080/01621459.2015.1100996

P. Coretto and C. Hennig (2017). Consistency, breakdown robustness, and algorithms for robust improper maximum likelihood clustering. Journal of Machine Learning Research, Vol. 18(142), pp. 1-39. https://jmlr.org/papers/v18/16-382.html

Pietro Coretto pcoretto@unisa.it https://pietro-coretto.github.io

plot.otrimle, InitClust, rimle,

## Load  Swiss banknotes data
data(banknote)
x <- banknote[,-1]

## Perform otrimle clustering with default arguments
set.seed(1)
a <- otrimle(data=x, G=2, logicd=c(-Inf, -50, -10), ncores=1)

## Plot clustering
plot(a, data=x, what="clustering")

## Plot OTRIMLE criterion profiling
plot(a, what="criterion")

## Plot Improper log-likelihood profiling
plot(a, what="iloglik")

## P-P plot of the clusterwise empirical weighted squared Mahalanobis
## distances against the target distribution pchisq(, df=ncol(data))
plot(a, what="fit")
plot(a, what="fit", cluster=1)



## Perform the same otrimle as before with non-zero penalty
set.seed(1)
b <- otrimle(data=x, G=2, beta = 0.5, logicd=c(-Inf, -50, -10), ncores=1)

## Plot clustering
plot(b, data=x, what="clustering")

## Plot OTRIMLE criterion profiling
plot(b, what="criterion")

## Plot Improper log-likelihood profiling
plot(b, what="iloglik")

## P-P plot of the clusterwise empirical weighted squared Mahalanobis
## distances against the target distribution pchisq(, df=ncol(data))
plot(b, what="fit")
plot(b, what="fit", cluster=1)





   
## Not run: 
## Perform the same example using the finer default grid of logicd
## values using multiple cores
##
a <- otrimle(data = x, G = 2)

## Inspect the otrimle criterion-vs-logicd
plot(a, what = 'criterion')

## The minimum occurs at  a$logicd=-9, and exploring a$optimization it
## cane be seen that the interval [-12.5, -4] brackets the optimal
## solution. We search with a finer grid located around the minimum
##
b <- otrimle(data = x, G = 2, logicd = seq(-12.5, -4, length.out = 25))

## Inspect the otrimle criterion-vs-logicd
plot(b, what = 'criterion')

## Check the difference between the two clusterings
table(A = a$cluster, B = b$cluster)

## Check differences in estimated parameters
##
colSums(abs(a$mean - b$mean))               ## L1 distance for mean vectors
apply({a$cov-b$cov}, 3, norm, type = "F")   ## Frobenius distance for covariances
c(Noise=abs(a$npr-b$npr), abs(a$cpr-b$cpr)) ## Absolute difference for proportions

## End(Not run)