KMeansSparseCluster: Performs sparse k-means clustering

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

This function performs sparse k-means clustering. You must specify a number of clusters K and an L1 bound on w, the feature weights.

Usage

1
2
3
4
5
6
KMeansSparseCluster(x, K=NULL, wbounds = NULL, nstart = 20, silent =
FALSE, maxiter=6, centers=NULL)
## S3 method for class 'KMeansSparseCluster'
plot(x,...)
## S3 method for class 'KMeansSparseCluster'
print(x,...)

Arguments

x

An nxp data matrix. There are n observations and p features.

K

The number of clusters desired ("K" in K-means clustering). Must provide either K or centers.

wbounds

A single L1 bound on w (the feature weights), or a vector of L1 bounds on w. If wbound is small, then few features will have non-zero weights. If wbound is large then all features will have non-zero weights. Should be greater than 1.

nstart

The number of random starts for the k-means algorithm.

silent

Print out progress?

maxiter

The maximum number of iterations.

centers

Optional argument. If you want to run the k-means algorithm starting from a particular set of clusters, then you can enter the Kxp matrix of cluster centers here. Default use case involves taking centers=NULL and instead specifying K.

...

not used.

Details

We seek a p-vector of weights w (one per feature) and a set of clusters C1,...,CK that optimize

$maximize_C1,...,CK,w sum_j w_j BCSS_j$ subject to $||w||_2 <= 1, ||w||_1 <= wbound, w_j >= 0$

where $BCSS_j$ is the between cluster sum of squares for feature j. An iterative approach is taken: with w fixed, optimize with respect to C1,...,CK, and with C1,...,CK fixed, optimize with respect to w. Here, wbound is a tuning parameter which determines the L1 bound on w.

The non-zero elements of w indicate features that are used in the sparse clustering.

Value

If wbounds is a vector, then a list with elements as follows (one per element of wbounds). If wbounds is just a single value, then elements as follows:

ws

The p-vector of feature weights.

Cs

The clustering obtained.

Author(s)

Daniela M. Witten and Robert Tibshirani

References

Witten and Tibshirani (2009) A framework for feature selection in clustering.

See Also

KMeansSparseCluster.permute,HierarchicalSparseCluster

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# generate data
set.seed(11)
x <- matrix(rnorm(50*70),ncol=70)
x[1:25,1:20] <- x[1:25,1:20]+1
x <- scale(x, TRUE, TRUE)
# choose tuning parameter
km.perm <- KMeansSparseCluster.permute(x,K=2,wbounds=seq(3,7,len=15),nperms=5)
print(km.perm)
plot(km.perm)
# run sparse k-means
km.out <- KMeansSparseCluster(x,K=2,wbounds=km.perm$bestw)
print(km.out)
plot(km.out)
# run sparse k-means for a range of tuning parameter values
km.out <- KMeansSparseCluster(x,K=2,wbounds=seq(1.3,4,len=8))
print(km.out)
plot(km.out)
# Run sparse k-means starting from a particular set of cluster centers
#in the k-means algorithm.
km.out <- KMeansSparseCluster(x,wbounds=2:7,centers=x[c(1,3,5),])

Example output

1012201301401501601701801901100111011201130114011501
Permutation  1 of  5
101232012301401501601701801901100111011201130114011501
Permutation  2 of  5
10123201301401501601701801901100111011201130114011501
Permutation  3 of  5
1012345201301401501601701801901100111011201130114011501
Permutation  4 of  5
10123201301401501601701801901100111011201130114011501
Permutation  5 of  5
10123201301401501601701801901100111011201130114011501

Tuning parameter selection results for Sparse K-means Clustering:
   Wbound # Non-Zero W's Gap Statistic Standard Deviation
1  3.0000             13        0.3090             0.0624
2  3.2857             18        0.3326             0.0596
3  3.5714             20        0.3519             0.0584
4  3.8571             24        0.3653             0.0575
5  4.1429             33        0.3730             0.0570
6  4.4286             44        0.3768             0.0568
7  4.7143             70        0.3777             0.0568
8  5.0000             70        0.3777             0.0568
9  5.2857             70        0.3777             0.0568
10 5.5714             70        0.3777             0.0568
11 5.8571             70        0.3777             0.0568
12 6.1429             70        0.3777             0.0568
13 6.4286             70        0.3777             0.0568
14 6.7143             70        0.3777             0.0568
15 7.0000             70        0.3777             0.0568
Tuning parameter that leads to largest Gap statistic:  4.714286
0123
Wbound is  4.714286 :
Number of non-zero weights:  70
Sum of weights:  4.658699
Clustering:  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

1012201233012401501601701801
Wbound is  1.3 :
Number of non-zero weights:  3
Sum of weights:  1.299995
Clustering:  1 2 1 1 1 2 2 1 1 1 2 1 2 1 2 1 1 1 1 1 1 2 1 1 1 2 2 2 1 2 1 1 2 
2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2

Wbound is  1.685714 :
Number of non-zero weights:  3
Sum of weights:  1.685715
Clustering:  1 2 1 1 1 2 2 1 1 1 2 1 2 1 1 1 1 1 1 1 1 2 1 1 1 2 2 2 1 2 1 2 2 
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

Wbound is  2.071429 :
Number of non-zero weights:  7
Sum of weights:  2.071435
Clustering:  1 2 1 1 1 2 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 2 2 1 2 1 2 2 
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

Wbound is  2.457143 :
Number of non-zero weights:  15
Sum of weights:  2.457192
Clustering:  1 2 1 1 1 2 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 2 2 1 2 1 2 2 
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

Wbound is  2.842857 :
Number of non-zero weights:  21
Sum of weights:  2.842794
Clustering:  1 2 1 1 1 2 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 2 2 1 2 1 2 2 
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

Wbound is  3.228571 :
Number of non-zero weights:  33
Sum of weights:  3.228517
Clustering:  1 2 1 1 1 2 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 2 2 1 2 1 2 2 
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

Wbound is  3.614286 :
Number of non-zero weights:  45
Sum of weights:  3.614518
Clustering:  1 2 1 1 1 2 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 2 2 1 2 1 2 2 
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

Wbound is  4 :
Number of non-zero weights:  70
Sum of weights:  3.915577
Clustering:  1 2 1 1 1 2 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 2 2 1 2 1 2 2 
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

101234520130124012501601

sparcl documentation built on May 1, 2019, 9:20 p.m.