Description Usage Arguments Details Value Author(s) References See Also Examples
Perform k-means clustering on a data matrix with cellwise outliers using a snipping algorithm.
1 |
X |
Data. |
k |
Integer; number of clusters, |
V |
Binary matrix of the same size as X. Zeros correspond to initial snipped entries. |
clust |
Vector of size |
itersmax |
Max number of iterations of the algorithm. Default is |
s |
Binary vector of size |
D |
Tuning parameter for the fitting algorithm. Corresponds approximately to the maximal change in loss by switching two non
outlying entries. Comparing different choices is recommended. Default is |
This function computes the skmeans
estimator of Farcomeni
(2014). It leads to robust k-means in presence of
entry-wise and cellwise outliers. The number of snipped entries
sum(1-V)
and trimmed rows sum(1-s)
is kept
fixed throughout. Initial estimates for V
, s
and
clust
should be provided. Note that initializing with labels arising from
classical (non-robust) clustering methods may be detrimental for the final
performance of skmeans
and may even yield an error due to
empty clusters.
A list with the following elements:
loss | Loss function (the total sum of squares) at convergence. |
mu | Estimated locations. |
s | Final (optimal) trimmed rows in vector of size n . |
V | Final (optimal) V matrix. |
clust | Final (optimal) class labels as vector of size n . |
Alessio Farcomeni alessio.farcomeni@uniroma1.it, Andy Leung andy.leung@stat.ubc.ca
Farcomeni, A. (2014) Snipping for robust k-means clustering under component-wise contamination, Statistics and Computing, 24, 909-917
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | set.seed(1234)
X <- matrix(NA,200,5)
# two clusters
k <- 2
X[1:100,] <- rnorm(100*5)
X[101:200,] <- rnorm(100*5,15)
clust <- rep(c(1,2), each=100)
# 5% cellwise outliers
s <- sample(200*5,200*5*0.05)
X[s] <- runif(200*5*0.05,-100,100)
V <- X
V[s] <- 0
V[-s] <- 1
# Initial V and R
Vinit <- matrix(1, nrow(X), ncol(X))
Vinit[which(X > quantile(X,0.975) | X < quantile(X,0.025))] <- 0
km <- kmeans(X,k)
clustinit <- km$clust
# Snipped robust clustering
skm <- skmeans(X, k, Vinit, clustinit)
table(clust,km$clust)
table(clust,skm$clust)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.