Description Usage Arguments Details Value Author(s) References See Also Examples
This function performs robust (weighted) and sparse k-means clustering for high-dimensional data (Brodinova et al (2017)).
For the given number of clusters k
and the sparsity parameter s
, the algorithm
detects clusters, outliers, and informative variables simultaneously.
1 |
data |
A data matrix with n observations and p variables. |
k |
The number of clusters. |
s |
The sparsity parameter which penalizes the L1 norm of variable weights, i.e. lasso type penalty.
The value should be larger than 1 and smaller than |
iteration |
The maximum number of iterations allowed. |
cutoff |
A cutoff value to determine outliers. An observation is declared as an outlier if its weight is smaller than or equal to this cutoff, the default is 0.5. |
The method is a three-step iterative procedure. First, a weighting function is employed during sparse k-means clustering with ROBIN initialization. Then, the variable weights from sparse k-means are updated for the given sparsity parameter. These two steps are repeated until the variable weights stabilize. Finally, both clusters and outliers are detected. The approach is a robust version of sparse k-means (Witten and Tibshirani, 2010) and an alternative of robust (trimmed) and sparse k-means (Kondo et al, 2016).
clusters |
An integer vector with values from 1 to k, indicating a resulting cluster membership. |
obsweights |
A numeric vector of observation weights ranging between 0 and 1. |
outclusters |
An integer vector with values from 0 to k, containing both cluster membership and identified outliers. 0 corresponds to outlier. |
varweights |
A numeric vector of variable weights reflecting the contribution of variables to a cluster separation. A high weight suggests that a variable is informative. |
WBCSS |
The weighted-between cluster sum of squares for the local optimum. The value is calculated with respect to the final variable weights and adjusted by the final observation weights. |
centers |
The set of final cluster centers. |
Sarka Brodinova <sarka.brodinova@tuwien.ac.at>
@references S. Brodinova, P. Filzmoser, T. Ortner, C. Breiteneder, M. Zaharieva. Robust and sparse k-means clustering for high-dimensional data. Submitted for publication, 2017. Available at http://arxiv.org/abs/1709.10012
D. M. Witten and R. Tibshirani. A framework for feature selection in clustering. Journal of the American Statistical Association, 105(490), 713-726, 2010.
Y. Kondo, M. Salibian-Barrera, R.H. Zamar. RSKC: An R Package for a Robust and Sparse K-Means Clustering Algorithm., Journal of Statistical Software, 72(5), 1-26, 2016.
Gapwrsk
, KMeansSparseCluster
, RSKC
1 2 3 4 5 6 7 8 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.