Description Usage Arguments Details Value Author(s) References See Also Examples
This function performs sparse k-means clustering. You must specify a number of clusters K and an L1 bound on w, the feature weights.
1 2 3 4 5 6 |
x |
An nxp data matrix. There are n observations and p features. |
K |
The number of clusters desired ("K" in K-means clustering). Must provide either K or centers. |
wbounds |
A single L1 bound on w (the feature weights), or a vector of L1 bounds on w. If wbound is small, then few features will have non-zero weights. If wbound is large then all features will have non-zero weights. Should be greater than 1. |
nstart |
The number of random starts for the k-means algorithm. |
silent |
Print out progress? |
maxiter |
The maximum number of iterations. |
centers |
Optional argument. If you want to run the k-means algorithm starting from a particular set of clusters, then you can enter the Kxp matrix of cluster centers here. Default use case involves taking centers=NULL and instead specifying K. |
... |
not used. |
We seek a p-vector of weights w (one per feature) and a set of clusters C1,...,CK that optimize
$maximize_C1,...,CK,w sum_j w_j BCSS_j$ subject to $||w||_2 <= 1, ||w||_1 <= wbound, w_j >= 0$
where $BCSS_j$ is the between cluster sum of squares for feature j. An iterative approach is taken: with w fixed, optimize with respect to C1,...,CK, and with C1,...,CK fixed, optimize with respect to w. Here, wbound is a tuning parameter which determines the L1 bound on w.
The non-zero elements of w indicate features that are used in the sparse clustering.
If wbounds is a vector, then a list with elements as follows (one per element of wbounds). If wbounds is just a single value, then elements as follows:
ws |
The p-vector of feature weights. |
Cs |
The clustering obtained. |
Daniela M. Witten and Robert Tibshirani
Witten and Tibshirani (2009) A framework for feature selection in clustering.
KMeansSparseCluster.permute,HierarchicalSparseCluster
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | # generate data
set.seed(11)
x <- matrix(rnorm(50*70),ncol=70)
x[1:25,1:20] <- x[1:25,1:20]+1
x <- scale(x, TRUE, TRUE)
# choose tuning parameter
km.perm <- KMeansSparseCluster.permute(x,K=2,wbounds=seq(3,7,len=15),nperms=5)
print(km.perm)
plot(km.perm)
# run sparse k-means
km.out <- KMeansSparseCluster(x,K=2,wbounds=km.perm$bestw)
print(km.out)
plot(km.out)
# run sparse k-means for a range of tuning parameter values
km.out <- KMeansSparseCluster(x,K=2,wbounds=seq(1.3,4,len=8))
print(km.out)
plot(km.out)
# Run sparse k-means starting from a particular set of cluster centers
#in the k-means algorithm.
km.out <- KMeansSparseCluster(x,wbounds=2:7,centers=x[c(1,3,5),])
|
1012201301401501601701801901100111011201130114011501
Permutation 1 of 5
101232012301401501601701801901100111011201130114011501
Permutation 2 of 5
10123201301401501601701801901100111011201130114011501
Permutation 3 of 5
1012345201301401501601701801901100111011201130114011501
Permutation 4 of 5
10123201301401501601701801901100111011201130114011501
Permutation 5 of 5
10123201301401501601701801901100111011201130114011501
Tuning parameter selection results for Sparse K-means Clustering:
Wbound # Non-Zero W's Gap Statistic Standard Deviation
1 3.0000 13 0.3090 0.0624
2 3.2857 18 0.3326 0.0596
3 3.5714 20 0.3519 0.0584
4 3.8571 24 0.3653 0.0575
5 4.1429 33 0.3730 0.0570
6 4.4286 44 0.3768 0.0568
7 4.7143 70 0.3777 0.0568
8 5.0000 70 0.3777 0.0568
9 5.2857 70 0.3777 0.0568
10 5.5714 70 0.3777 0.0568
11 5.8571 70 0.3777 0.0568
12 6.1429 70 0.3777 0.0568
13 6.4286 70 0.3777 0.0568
14 6.7143 70 0.3777 0.0568
15 7.0000 70 0.3777 0.0568
Tuning parameter that leads to largest Gap statistic: 4.714286
0123
Wbound is 4.714286 :
Number of non-zero weights: 70
Sum of weights: 4.658699
Clustering: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
1012201233012401501601701801
Wbound is 1.3 :
Number of non-zero weights: 3
Sum of weights: 1.299995
Clustering: 1 2 1 1 1 2 2 1 1 1 2 1 2 1 2 1 1 1 1 1 1 2 1 1 1 2 2 2 1 2 1 1 2
2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2
Wbound is 1.685714 :
Number of non-zero weights: 3
Sum of weights: 1.685715
Clustering: 1 2 1 1 1 2 2 1 1 1 2 1 2 1 1 1 1 1 1 1 1 2 1 1 1 2 2 2 1 2 1 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
Wbound is 2.071429 :
Number of non-zero weights: 7
Sum of weights: 2.071435
Clustering: 1 2 1 1 1 2 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 2 2 1 2 1 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
Wbound is 2.457143 :
Number of non-zero weights: 15
Sum of weights: 2.457192
Clustering: 1 2 1 1 1 2 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 2 2 1 2 1 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
Wbound is 2.842857 :
Number of non-zero weights: 21
Sum of weights: 2.842794
Clustering: 1 2 1 1 1 2 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 2 2 1 2 1 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
Wbound is 3.228571 :
Number of non-zero weights: 33
Sum of weights: 3.228517
Clustering: 1 2 1 1 1 2 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 2 2 1 2 1 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
Wbound is 3.614286 :
Number of non-zero weights: 45
Sum of weights: 3.614518
Clustering: 1 2 1 1 1 2 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 2 2 1 2 1 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
Wbound is 4 :
Number of non-zero weights: 70
Sum of weights: 3.915577
Clustering: 1 2 1 1 1 2 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 2 2 1 2 1 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
101234520130124012501601
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.