wrk: wrk
In brodsa/wrsk: Robust (weighted) and sparse k-means clustering

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/wrk.R

This function performs robust (weighted) k-means clustering which is very useful in case of contaminated data. The method aims at detecting both clusters and outliers.

1	wrk(data, D, k, max.iter = 30, cutoff = 0.5)

`data`	A data matrix with n observations and p variables.
`D`	A distance matrix.
`k`	The number of clusters.
`max.iter`	The maximum number of iterations to reach local optimum, the default is 30.
`cutoff`	A cutoff value to determine outliers. An observation is declared as an outlier if its weight is smaller than or equal to this cutoff, the default is 0.5.

wrk initializes the clustering procedure by ROBIN approach and incorporates a weighting function on each detected clusters during k-means clustering. The weighting function uses LOF in order to assign a weight for each observation. The resulting observation weights reflect a degree of outlyingness and range between 0 and 1. These weights are used to identify outliers as observations with a weight <=cutoff.

`clusters`	An integer vector with values from 1 to k, indicating a resulting cluster membership.
`obsweights`	A numeric vector of observation weights ranging between 0 and 1.
`outclusters`	An integer vector with values from 0 to k, containing both cluster membership and identified outliers. 0 corresponds to outlier.
`WCSS`	The within-cluster sum of squares of the local optimum.
`centers`	The set of cluster centers.

Sarka Brodinova <sarka.brodinova@tuwien.ac.at>

@references S. Brodinova, P. Filzmoser, T. Ortner, C. Breiteneder, M. Zaharieva. Robust and sparse k-means clustering for high-dimensional data. Submitted for publication, 2017. Available at http://arxiv.org/abs/1709.10012

ROBIN

# generate data
d <- SimData(size_grp=c(40,40,40),p_inf=50,
p_noise=750,p_out_noise=75)
dat <- scale(d$x)

res <- wrk(data=dat,D=dist(dat),k=3)
table(d$y,res$clusters)
table(d$lb,res$outclusters)
plot(res$obsweights,col=d$lb +1)