skmeans: Snipped k-means clustering with cellwise outliers
In snipEM: Snipping Methods for Robust Estimation and Clustering

Description Usage Arguments Details Value Author(s) References See Also Examples

Perform k-means clustering on a data matrix with cellwise outliers using a snipping algorithm.

1	skmeans(X, k, V, clust, s, itersmax = 10^5, D = 1e-1)

`X`	Data.
`k`	Integer; number of clusters, `k>1`.
`V`	Binary matrix of the same size as X. Zeros correspond to initial snipped entries.
`clust`	Vector of size `n` containing values from `1` to `k`. Starting solution for class labels.
`itersmax`	Max number of iterations of the algorithm. Default is `3*10^5`.
`s`	Binary vector of size `n` for trimming, starting solution. Number of zeros will be preserved and correspond to trimmed rows. If the vector is `rep(1,n)`, it performs no trimming. Default is `rep(1,n)`.
`D`	Tuning parameter for the fitting algorithm. Corresponds approximately to the maximal change in loss by switching two non outlying entries. Comparing different choices is recommended. Default is `1e-1`.

This function computes the skmeans estimator of Farcomeni (2014). It leads to robust k-means in presence of entry-wise and cellwise outliers. The number of snipped entries sum(1-V) and trimmed rows sum(1-s) is kept fixed throughout. Initial estimates for V, s and clust should be provided. Note that initializing with labels arising from classical (non-robust) clustering methods may be detrimental for the final performance of skmeans and may even yield an error due to empty clusters.

A list with the following elements:

`loss`	Loss function (the total sum of squares) at convergence.
`mu`	Estimated locations.
`s`	Final (optimal) trimmed rows in vector of size `n`.
`V`	Final (optimal) V matrix.
`clust`	Final (optimal) class labels as vector of size `n`.

Alessio Farcomeni alessio.farcomeni@uniroma1.it, Andy Leung andy.leung@stat.ubc.ca

Farcomeni, A. (2014) Snipping for robust k-means clustering under component-wise contamination, Statistics and Computing, 24, 909-917

sclust, stEM, snipEM,

set.seed(1234)
X <- matrix(NA,200,5)
# two clusters
k <- 2
X[1:100,] <- rnorm(100*5)
X[101:200,] <- rnorm(100*5,15)
clust <- rep(c(1,2), each=100)

# 5% cellwise outliers
s <- sample(200*5,200*5*0.05)
X[s] <- runif(200*5*0.05,-100,100)
V <- X
V[s] <- 0
V[-s] <- 1

# Initial V and R
Vinit <- matrix(1, nrow(X), ncol(X))
Vinit[which(X > quantile(X,0.975) | X < quantile(X,0.025))] <- 0
km <- kmeans(X,k)
clustinit <- km$clust

# Snipped robust clustering
skm <- skmeans(X, k, Vinit, clustinit)

table(clust,km$clust)
table(clust,skm$clust)