KMeansSparseCluster.permute.weight: Choose Sparsity Parameter for Sparse Weighted K-Means...

Description Usage Arguments Value Author(s) References See Also Examples

View source: R/KMeansSparseCluster_permute_weight.R

Description

The sparsity parameter controls the L1 bound on w, the feature weights. A permutation approach is used to select the sparsity parameter.

Usage

1
2
3
KMeansSparseCluster.permute.weight(x, K = NULL, weight = NULL,
  nperms = 20, nstart = 20, wbounds = NULL, silent = TRUE,
  nvals = 10, centers = NULL)

Arguments

x

An n by p numeric data matrix, and n is the number of observations and p the number of features.

K

The number of clusters. Omitted if centers are provided.

weight

A vector of n positive elements representing weights on observations.

nperms

Number of permutations. Default is 20.

nstart

The number of initial random sets chosen from (distinct) rows in x. Omitted if centers is provided. Default is 20.

wbounds

A single L1 bound on w (the feature weights), or a vector of L1 bounds on w. If wbound is small, then few features will have non-zero weights. If wbound is large then all features will have non-zero weights. Should be greater than 1.

silent

Print out progress?

nvals

The number of candidate tuning parameter values. Omitted if wbounds is given.

centers

A K by p matrix indicating initial (distinct) cluster centers.

Value

gaps

The gap statistics obtained (one for each of the tuning parameters tried). If O(s) is the objective function evaluated at the tuning parameter s, and O*(s) is the same quantity but for the permuted data, then Gap(s)=log(O(s))-mean(log(O*(s))).

sdgaps

The standard deviation of log(O*(s)), for each value of the tuning parameter s.

nnonzerows

The number of features with non-zero weights, for each value of the tuning parameter.

wbounds

The tuning parameters considered.

bestw

The value of the tuning parameter corresponding to the highest gap statistic.

Author(s)

Wenyu Zhang

References

Daniela M Witten and Robert Tibshirani (2010). A framework for feature selection in clustering. Journal of the American Statistical Association, 105(490), 713-726.

See Also

Other sparse weighted K-Means functions: ChooseK, KMeansSparseCluster.weight, kmeans.weight.tune, kmeans.weight

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
## Not run: 
set.seed(1)
data("NormalDisData")
cK <- ChooseK(NormalDisData$data[-NormalDisData$noisy.label,],nClusters = 1:6)
plot(cK)
K <- cK$OptimalK
res.tuneU <- kmeans.weight.tune(x = NormalDisData$data,K = K,
noisy.lab = NormalDisData$noisy.label,weight.seq = NULL)
plot(res.tuneU)
res.tunes <- KMeansSparseCluster.permute.weight(x = NormalDisData$data,K = K,
weight = res.tuneU$bestweight)
res <- KMeansSparseCluster.weight(x = NormalDisData$data,K = K,
wbounds = res.tunes$bestw,weight = res.tuneU$bestweight)
#check the clustering result, the number of features selected and the 50 most important features 
table(res[[1]]$Cs,NormalDisData$true.label)
sum(res[[1]]$ws!=0)
order(res[[1]]$ws,decreasing = TRUE)[1:50]

## End(Not run)

Van1yu3/SWKM documentation built on Sept. 3, 2019, 7:50 a.m.