wrsk: wrsk
In brodsa/wrsk: Robust (weighted) and sparse k-means clustering

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/wrsk.R

This function performs robust (weighted) and sparse k-means clustering for high-dimensional data (Brodinova et al (2017)). For the given number of clusters k and the sparsity parameter s, the algorithm detects clusters, outliers, and informative variables simultaneously.

1	wrsk(data, k, s, iteration = 15, cutoff = 0.5)

`data`	A data matrix with n observations and p variables.
`k`	The number of clusters.
`s`	The sparsity parameter which penalizes the L1 norm of variable weights, i.e. lasso type penalty. The value should be larger than 1 and smaller than `sqrt(p)`.
`iteration`	The maximum number of iterations allowed.
`cutoff`	A cutoff value to determine outliers. An observation is declared as an outlier if its weight is smaller than or equal to this cutoff, the default is 0.5.

The method is a three-step iterative procedure. First, a weighting function is employed during sparse k-means clustering with ROBIN initialization. Then, the variable weights from sparse k-means are updated for the given sparsity parameter. These two steps are repeated until the variable weights stabilize. Finally, both clusters and outliers are detected. The approach is a robust version of sparse k-means (Witten and Tibshirani, 2010) and an alternative of robust (trimmed) and sparse k-means (Kondo et al, 2016).

`clusters`	An integer vector with values from 1 to k, indicating a resulting cluster membership.
`obsweights`	A numeric vector of observation weights ranging between 0 and 1.
`outclusters`	An integer vector with values from 0 to k, containing both cluster membership and identified outliers. 0 corresponds to outlier.
`varweights`	A numeric vector of variable weights reflecting the contribution of variables to a cluster separation. A high weight suggests that a variable is informative.
`WBCSS`	The weighted-between cluster sum of squares for the local optimum. The value is calculated with respect to the final variable weights and adjusted by the final observation weights.
`centers`	The set of final cluster centers.

Sarka Brodinova <sarka.brodinova@tuwien.ac.at>

@references S. Brodinova, P. Filzmoser, T. Ortner, C. Breiteneder, M. Zaharieva. Robust and sparse k-means clustering for high-dimensional data. Submitted for publication, 2017. Available at http://arxiv.org/abs/1709.10012

D. M. Witten and R. Tibshirani. A framework for feature selection in clustering. Journal of the American Statistical Association, 105(490), 713-726, 2010.

Y. Kondo, M. Salibian-Barrera, R.H. Zamar. RSKC: An R Package for a Robust and Sparse K-Means Clustering Algorithm., Journal of Statistical Software, 72(5), 1-26, 2016.

Gapwrsk, KMeansSparseCluster, RSKC

# generate data
d <- SimData(size_grp=c(40,40,40),p_inf=50,
p_noise=750,p_out_noise=75)
dat <- scale(d$x)

res <- wrsk(data=dat,k=3,s=6)
table(d$lb,res$outclusters)
plot(res$varweights)