tun_calc: Sparse K-means Tuning Parameter Selection for Subsamples

Description Usage Arguments Details Value Author(s) References Examples

View source: R/tun_par.R

Description

Calculate Gap Statistic for list of subsamples to help select tuning parameters from list to be used by sparse k-means algorithm on each subsample as part of Stable Sparse k-means algorithm.

Usage

1
tun_calc(data, k, wb = NULL, nperms = 25, quiet = FALSE)

Arguments

data

List of subsamples. Each matrix should be a subsample from the same source of original data.

k

The number of clusters assumed to be in the data.

wb

The range of tuning parameters to consider. If NULL then this is chosen automatically. The default is NULL. See sparcl documentation for more details. https://cran.r-project.org/package=sparcl

nperms

The number of permutations. Default is 25.

quiet

Print out progress?

Details

Each data matrix should have each row reflect a different observation and each column a different gene.

Let O(s) denote the objective function with tuning parameter s. To calculate the Gap statistic, the observations are permuted within each feature. Using the permuted data, sparse K-means is run with tuning parameter s, yielding the objective function O*(s). This is done repeatedly to get a number of O*(s) values. Then, the Gap statistic is given by $Gap(s)=log(O(s))-mean(log(O*(s)))$. The optimal s is that which results in the highest Gap statistic. Or, we can choose the smallest s such that its Gap statistic is within $sd(log(O*(s)))$ of the largest Gap statistic.

Value

A list of objects containing tuning parameter selection information. Each object contains the following:

gaps

The gap statistic

sdgaps

The standard deviation of log(O*(s)), for each value of the tuning parameter s. See sparcl documentation for more details.

nnonzerows

The number of features with non-zero weights, for each value of the tuning parameter.

wbounds

The tuning parameters considered

bestw

The tuning parameter with highest gap statistic

Author(s)

Abraham Apfel

References

Witten and Tibshirani (2009) A framework for feature selection in clustering.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#Simulate data matrix
dat1<-matrix(rnorm(200,-1,1),20,10)
dat2<-matrix(rnorm(800,0,1),20,40)
C1<-cbind(dat1,dat2)
C2<-matrix(rnorm(1000,0,1),20,50)
dat3<-matrix(rnorm(200,1,1),20,10)
dat4<-matrix(rnorm(800,0,1),20,40)
C3<-cbind(dat3,dat4)
orig.sample<-rbind(C1,C2,C3)

#Take B=4 subsamples
sub.sample<-sub.sim(data=orig.sample,N=60,prop=0.5,B=4)

#Calculate gap statistic to aid in tuning parameter selection for each subsample
tun_par<-tun_calc(data=sub.sample,k=3,wb=NULL,nperms=5,quiet=FALSE)

#Create list based on highest gap statistic to be used as wb parameter
max.gap<-replicate(n=4,expr=list())
for(i in 1:4){
  max.gap[[i]]<-tun_par[[i]]$bestw
}
pi1<-c(0.2,0.3,0.4)
#Apply Stable Sparse K-means algorithm on subsamples
res<-stablecluster(data=sub.sample,k=3,wb=max.gap,nstart=5,maxiter=6,orig=orig.sample,N=60,pi=pi1)

sskm documentation built on May 29, 2017, 10:43 p.m.

Related to tun_calc in sskm...