KMeansSparseCluster.permute.weight: Choose Sparsity Parameter for Sparse Weighted K-Means...

View source: R/KMeansSparseCluster_permute_weight.R

KMeansSparseCluster.permute.weightR Documentation

Choose Sparsity Parameter for Sparse Weighted K-Means Clustering

Description

The sparsity parameter controls the L1 bound on w, the feature weights. A permutation approach is used to select the sparsity parameter.

Usage

KMeansSparseCluster.permute.weight(x, K = NULL, weight = NULL,
  nperms = 20, nstart = 20, wbounds = NULL, silent = TRUE,
  nvals = 10, centers = NULL)

Arguments

x

An n by p numeric data matrix, and n is the number of observations and p the number of features.

K

The number of clusters. Omitted if centers are provided.

weight

A vector of n positive elements representing weights on observations.

nperms

Number of permutations. Default is 20.

nstart

The number of initial random sets chosen from (distinct) rows in x. Omitted if centers is provided. Default is 20.

wbounds

A single L1 bound on w (the feature weights), or a vector of L1 bounds on w. If wbound is small, then few features will have non-zero weights. If wbound is large then all features will have non-zero weights. Should be greater than 1.

silent

Print out progress?

nvals

The number of candidate tuning parameter values. Omitted if wbounds is given.

centers

A K by p matrix indicating initial (distinct) cluster centers.

Value

gaps

The gap statistics obtained (one for each of the tuning parameters tried). If O(s) is the objective function evaluated at the tuning parameter s, and O*(s) is the same quantity but for the permuted data, then Gap(s)=log(O(s))-mean(log(O*(s))).

sdgaps

The standard deviation of log(O*(s)), for each value of the tuning parameter s.

nnonzerows

The number of features with non-zero weights, for each value of the tuning parameter.

wbounds

The tuning parameters considered.

bestw

The value of the tuning parameter corresponding to the highest gap statistic.

Author(s)

Wenyu Zhang

References

Daniela M Witten and Robert Tibshirani (2010). A framework for feature selection in clustering. Journal of the American Statistical Association, 105(490), 713-726.

See Also

Other sparse weighted K-Means functions: ChooseK, KMeansSparseCluster.weight, kmeans.weight.tune, kmeans.weight

Examples

## Not run: 
set.seed(1)
data("NormalDisData")
cK <- ChooseK(NormalDisData$data[-NormalDisData$noisy.label,],nClusters = 1:6)
plot(cK)
K <- cK$OptimalK
res.tuneU <- kmeans.weight.tune(x = NormalDisData$data,K = K,
noisy.lab = NormalDisData$noisy.label,weight.seq = NULL)
plot(res.tuneU)
res.tunes <- KMeansSparseCluster.permute.weight(x = NormalDisData$data,K = K,
weight = res.tuneU$bestweight)
res <- KMeansSparseCluster.weight(x = NormalDisData$data,K = K,
wbounds = res.tunes$bestw,weight = res.tuneU$bestweight)
#check the clustering result, the number of features selected and the 50 most important features 
table(res[[1]]$Cs,NormalDisData$true.label)
sum(res[[1]]$ws!=0)
order(res[[1]]$ws,decreasing = TRUE)[1:50]

## End(Not run)

cuhklinlab/SWKM documentation built on Aug. 5, 2022, 2:27 a.m.