Description Usage Arguments Details Value Author(s) References See Also Examples
The tuning parameter controls the L1 bound on w, the feature weights. A permutation approach is used to select the tuning parameter.
1 2 3 4 5 6 |
x |
The nxp data matrix, n is the number of observations and p the number of features. |
K |
The number of clusters desired - that is, the "K" in K-means clustering. Must specify K or centers. |
nperms |
Number of permutations. |
wbounds |
The range of tuning parameters to consider. This is the L1 bound on w, the feature weights. If NULL, then a range of values will be chosen automatically. Should be greater than 1 if non-null. |
silent |
Print out progress? |
nvals |
If wbounds is NULL, then the number of candidate tuning parameter values to consider. |
centers |
Optional argument. If you want to run the k-means algorithm starting from a particular set of clusters, then you can enter the Kxp matrix of cluster centers here. Default use case involves taking centers=NULL and instead specifying K. |
... |
not used. |
Sparse k-means clustering seeks a p-vector of weights w (one per feature) and a set of clusters C1,...,CK that optimize $maximize_C1,...,CK,w sum_j w_j BCSS_j$ subject to $||w||_2 <= 1, ||w||_1 <= s, w_j >= 0$, where $BCSS_j$ is the between cluster sum of squares for feature j, and s is a value for the L1 bound on w. Let O(s) denote the objective function with tuning parameter s: i.e. $O(s)=sum_j w_j BCSS_j$.
We permute the data as follows: within each feature, we permute the observations. Using the permuted data, we can run sparse K-means with tuning parameter s, yielding the objective function O*(s). If we do this repeatedly we can get a number of O*(s) values.
Then, the Gap statistic is given by $Gap(s)=log(O(s))-mean(log(O*(s)))$. The optimal s is that which results in the highest Gap statistic. Or, we can choose the smallest s such that its Gap statistic is within $sd(log(O*(s)))$ of the largest Gap statistic.
gaps |
The gap statistics obtained (one for each of the tuning parameters tried). If O(s) is the objective function evaluated at the tuning parameter s, and O*(s) is the same quantity but for the permuted data, then Gap(s)=log(O(s))-mean(log(O*(s))). |
sdgaps |
The standard deviation of log(O*(s)), for each value of the tuning parameter s. |
nnonzerows |
The number of features with non-zero weights, for each value of the tuning parameter. |
wbounds |
The tuning parameters considered. |
bestw |
The value of the tuning parameter corresponding to the highest gap statistic. |
Daniela M. Witten and Robert Tibshirani
Witten and Tibshirani (2009) A framework for feature selection in clustering.
KMeansSparseCluster, HierarchicalSparseCluster, HierarchicalSparseCluster.permute
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | # generate data
set.seed(11)
x <- matrix(rnorm(50*70),ncol=70)
x[1:25,1:10] <- x[1:25,1:10]+1.5
x <- scale(x, TRUE, TRUE)
# choose tuning parameter
km.perm <-
KMeansSparseCluster.permute(x,K=2,wbounds=seq(2,5,len=8),nperms=3)
print(km.perm)
plot(km.perm)
# run sparse k-means
km.out <- KMeansSparseCluster(x,K=2,wbounds=km.perm$bestw)
print(km.out)
plot(km.out)
# run sparse k-means for a range of tuning parameter values
km.out <- KMeansSparseCluster(x,K=2,wbounds=2:7)
print(km.out)
plot(km.out)
# Repeat, but this time start with a particular choice of cluster
# centers.
# This will do 4-means clustering starting with this particular choice
# of cluster centers.
km.perm.out <- KMeansSparseCluster.permute(x,wbounds=2:6, centers=x[1:4,],nperms=3)
print(km.out)
plot(km.out)
|
1012320123014012501601701801
Permutation 1 of 3
10123201301401501601701801
Permutation 2 of 3
1012345201301401501601701801
Permutation 3 of 3
10123201301401501601701801
Tuning parameter selection results for Sparse K-means Clustering:
Wbound # Non-Zero W's Gap Statistic Standard Deviation
1 2.0000 5 0.3050 0.0811
2 2.4286 9 0.3846 0.0709
3 2.8571 11 0.4332 0.0633
4 3.2857 21 0.4637 0.0579
5 3.7143 40 0.4688 0.0551
6 4.1429 70 0.4694 0.0548
7 4.5714 70 0.4694 0.0548
8 5.0000 70 0.4694 0.0548
Tuning parameter that leads to largest Gap statistic: 4.142857
012
Wbound is 4.142857 :
Number of non-zero weights: 70
Sum of weights: 3.988581
Clustering: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 1 2 1 2 2 2 2 2
1012320123301401501601
Wbound is 2 :
Number of non-zero weights: 5
Sum of weights: 2.000011
Clustering: 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2
2 2 2 2 1 2 2 2 2 1 2 1 2 2 2 2 2
Wbound is 3 :
Number of non-zero weights: 12
Sum of weights: 3.000019
Clustering: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 1 2 1 2 2 2 2 2
Wbound is 4 :
Number of non-zero weights: 70
Sum of weights: 3.988581
Clustering: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 1 2 1 2 2 2 2 2
Wbound is 5 :
Number of non-zero weights: 70
Sum of weights: 3.988581
Clustering: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 1 2 1 2 2 2 2 2
Wbound is 6 :
Number of non-zero weights: 70
Sum of weights: 3.988581
Clustering: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 1 2 1 2 2 2 2 2
Wbound is 7 :
Number of non-zero weights: 70
Sum of weights: 3.988581
Clustering: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 1 2 1 2 2 2 2 2
10123201230124012501
Permutation 1 of 3
1012345201301234401501
Permutation 2 of 3
101234201301240123501
Permutation 3 of 3
10123452012301240123501
Wbound is 2 :
Number of non-zero weights: 5
Sum of weights: 2.000011
Clustering: 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2
2 2 2 2 1 2 2 2 2 1 2 1 2 2 2 2 2
Wbound is 3 :
Number of non-zero weights: 12
Sum of weights: 3.000019
Clustering: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 1 2 1 2 2 2 2 2
Wbound is 4 :
Number of non-zero weights: 70
Sum of weights: 3.988581
Clustering: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 1 2 1 2 2 2 2 2
Wbound is 5 :
Number of non-zero weights: 70
Sum of weights: 3.988581
Clustering: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 1 2 1 2 2 2 2 2
Wbound is 6 :
Number of non-zero weights: 70
Sum of weights: 3.988581
Clustering: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 1 2 1 2 2 2 2 2
Wbound is 7 :
Number of non-zero weights: 70
Sum of weights: 3.988581
Clustering: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 1 2 1 2 2 2 2 2
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.