sclust: Snipping for robust model based clustering analysis with...
In snipEM: Snipping Methods for Robust Estimation and Clustering

Description Usage Arguments Details Value Author(s) References See Also Examples

Estimates a finite Gaussian mixture model optimized over a snipping set.

1 2	sclust(X, k, V, R, restr.fact=12, tol = 1e-04, maxiters = 100, maxiters.S = 1000, print.it = FALSE)

`X`	Data.
`k`	Number of clusters
`V`	Binary matrix of the same size as `X`. Zeros correspond to initial snipped entries.
`R`	Initial guess for cluster labels, `1` to `k`.
`restr.fact`	Restriction factor, i.e., constraint on the condition number of all covariance matrices for each cluster. Default is 12.
`tol`	Tolerance for convergence. Default is `1e-4`.
`maxiters`	Maximum number of iterations for the SM algorithm. Default is `100`.
`maxiters.S`	Maximum number of iterations of the inner greedy snipping algorithm. Default is `1000`.
`print.it`	Logical; if TRUE, partial results are print. Default is `FALSE`.

This function computes the sclust estimator of Farcomeni (2014). It leads to robust mixture modeling in presence of entry-wise outliers. It is based on a classification-expectation-snip-maximize (CESM) algorithm. At the S step, the likelihood is optimized over the set of snipped entries, at the M step the location and scatter estimates are updated. The S step is based on a greedy algorithm, unlike the one proposed in Farcomeni (2014,2014a). The number of snipped entries sum(1-V) is kept fixed throughout. Note that initializing with labels arising from classical (non-robust) clustering methods may be detrimental for the final performance of sclust and may even yield an error due to empty clusters.

A list with the following elements:

`R`	Final cluster labels.
`mu`	Estimated location matrix.
`S`	Array of estimated scatter matrices.
`V`	Final (optimal) V matrix.
`lik`	Gaussian log-likelihood at convergence.
`iter`	Number of outer iterations before convergence.

Alessio Farcomeni alessio.farcomeni@uniroma1.it, Andy Leung andy.leung@stat.ubc.ca

Farcomeni, A. (2014) Snipping for robust k-means clustering under component-wise contamination, Statistics and Computing, 24, 909-917

Farcomeni, A. (2014) Robust constrained clustering in presence of entry-wise outliers, Technometrics, 56, 102-111

snipEM, stEM, sumlog, ldmvnorm

set.seed(1234)
X <- matrix(NA,200,5)
# two clusters
k <- 2
X[1:100,] <- rnorm(100*5)
X[101:200,] <- rnorm(100*5,15)
R <- rep(c(1,2), each=100)

# 5% cellwise outliers
s <- sample(200*5,200*5*0.05)
X[s] <- runif(200*5*0.05,-100,100)
V <- X
V[s] <- 0
V[-s] <- 1

# Initial V and R
Vinit <- matrix(1, nrow(X), ncol(X))
Vinit[which(X > quantile(X,0.975) | X < quantile(X,0.025))] <- 0
Rinit <- kmeans(X,2)$clust

# Snipped robust clustering
sc <- sclust(X,2,Vinit,Rinit)
table(R,Rinit)
table(R,sc$R)