KMeansSparseCluster.permute.weight: Choose Sparsity Parameter for Sparse Weighted K-Means...
In cuhklinlab/SWKM: Sparse Weighted K-Means

View source: R/KMeansSparseCluster_permute_weight.R

KMeansSparseCluster.permute.weight

R Documentation

Choose Sparsity Parameter for Sparse Weighted K-Means Clustering

Description

The sparsity parameter controls the L1 bound on w, the feature weights. A permutation approach is used to select the sparsity parameter.

Usage

KMeansSparseCluster.permute.weight(x, K = NULL, weight = NULL,
  nperms = 20, nstart = 20, wbounds = NULL, silent = TRUE,
  nvals = 10, centers = NULL)

Arguments

`x`	An n by p numeric data matrix, and n is the number of observations and p the number of features.
`K`	The number of clusters. Omitted if `centers` are provided.
`weight`	A vector of n positive elements representing weights on observations.
`nperms`	Number of permutations. Default is `20`.
`nstart`	The number of initial random sets chosen from (distinct) rows in `x`. Omitted if `centers` is provided. Default is 20.
`wbounds`	A single L1 bound on w (the feature weights), or a vector of L1 bounds on w. If wbound is small, then few features will have non-zero weights. If wbound is large then all features will have non-zero weights. Should be greater than 1.
`silent`	Print out progress?
`nvals`	The number of candidate tuning parameter values. Omitted if `wbounds` is given.
`centers`	A K by p matrix indicating initial (distinct) cluster centers.

Value

`gaps`	The gap statistics obtained (one for each of the tuning parameters tried). If O(s) is the objective function evaluated at the tuning parameter s, and O(s) is the same quantity but for the permuted data, then Gap(s)=log(O(s))-mean(log(O(s))).
`sdgaps`	The standard deviation of log(O*(s)), for each value of the tuning parameter s.
`nnonzerows`	The number of features with non-zero weights, for each value of the tuning parameter.
`wbounds`	The tuning parameters considered.
`bestw`	The value of the tuning parameter corresponding to the highest gap statistic.

Author(s)

Wenyu Zhang

References

Daniela M Witten and Robert Tibshirani (2010). A framework for feature selection in clustering. Journal of the American Statistical Association, 105(490), 713-726.

Examples

## Not run: 
set.seed(1)
data("NormalDisData")
cK <- ChooseK(NormalDisData$data[-NormalDisData$noisy.label,],nClusters = 1:6)
plot(cK)
K <- cK$OptimalK
res.tuneU <- kmeans.weight.tune(x = NormalDisData$data,K = K,
noisy.lab = NormalDisData$noisy.label,weight.seq = NULL)
plot(res.tuneU)
res.tunes <- KMeansSparseCluster.permute.weight(x = NormalDisData$data,K = K,
weight = res.tuneU$bestweight)
res <- KMeansSparseCluster.weight(x = NormalDisData$data,K = K,
wbounds = res.tunes$bestw,weight = res.tuneU$bestweight)
#check the clustering result, the number of features selected and the 50 most important features 
table(res[[1]]$Cs,NormalDisData$true.label)
sum(res[[1]]$ws!=0)
order(res[[1]]$ws,decreasing = TRUE)[1:50]

## End(Not run)

cuhklinlab/SWKM documentation built on Aug. 5, 2022, 2:27 a.m.