View source: R/KMeansSparseCluster_permute_weight.R
KMeansSparseCluster.permute.weight | R Documentation |
The sparsity parameter controls the L1 bound on w, the feature weights. A permutation approach is used to select the sparsity parameter.
KMeansSparseCluster.permute.weight(x, K = NULL, weight = NULL, nperms = 20, nstart = 20, wbounds = NULL, silent = TRUE, nvals = 10, centers = NULL)
x |
An n by p numeric data matrix, and n is the number of observations and p the number of features. |
K |
The number of clusters. Omitted if |
weight |
A vector of n positive elements representing weights on observations. |
nperms |
Number of permutations. Default is |
nstart |
The number of initial random sets chosen from (distinct) rows in |
wbounds |
A single L1 bound on w (the feature weights), or a vector of L1 bounds on w. If wbound is small, then few features will have non-zero weights. If wbound is large then all features will have non-zero weights. Should be greater than 1. |
silent |
Print out progress? |
nvals |
The number of candidate tuning parameter values. Omitted if |
centers |
A K by p matrix indicating initial (distinct) cluster centers. |
gaps |
The gap statistics obtained (one for each of the tuning parameters tried). If O(s) is the objective function evaluated at the tuning parameter s, and O*(s) is the same quantity but for the permuted data, then Gap(s)=log(O(s))-mean(log(O*(s))). |
sdgaps |
The standard deviation of log(O*(s)), for each value of the tuning parameter s. |
nnonzerows |
The number of features with non-zero weights, for each value of the tuning parameter. |
wbounds |
The tuning parameters considered. |
bestw |
The value of the tuning parameter corresponding to the highest gap statistic. |
Wenyu Zhang
Daniela M Witten and Robert Tibshirani (2010). A framework for feature selection in clustering. Journal of the American Statistical Association, 105(490), 713-726.
Other sparse weighted K-Means functions: ChooseK
,
KMeansSparseCluster.weight
,
kmeans.weight.tune
,
kmeans.weight
## Not run: set.seed(1) data("NormalDisData") cK <- ChooseK(NormalDisData$data[-NormalDisData$noisy.label,],nClusters = 1:6) plot(cK) K <- cK$OptimalK res.tuneU <- kmeans.weight.tune(x = NormalDisData$data,K = K, noisy.lab = NormalDisData$noisy.label,weight.seq = NULL) plot(res.tuneU) res.tunes <- KMeansSparseCluster.permute.weight(x = NormalDisData$data,K = K, weight = res.tuneU$bestweight) res <- KMeansSparseCluster.weight(x = NormalDisData$data,K = K, wbounds = res.tunes$bestw,weight = res.tuneU$bestweight) #check the clustering result, the number of features selected and the 50 most important features table(res[[1]]$Cs,NormalDisData$true.label) sum(res[[1]]$ws!=0) order(res[[1]]$ws,decreasing = TRUE)[1:50] ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.