# tun_calc: Sparse K-means Tuning Parameter Selection for Subsamples In sskm: Stable Sparse K-Means

## Description

Calculate Gap Statistic for list of subsamples to help select tuning parameters from list to be used by sparse k-means algorithm on each subsample as part of Stable Sparse k-means algorithm.

## Usage

 `1` ```tun_calc(data, k, wb = NULL, nperms = 25, quiet = FALSE) ```

## Arguments

 `data` List of subsamples. Each matrix should be a subsample from the same source of original data. `k` The number of clusters assumed to be in the data. `wb` The range of tuning parameters to consider. If NULL then this is chosen automatically. The default is NULL. See sparcl documentation for more details. https://cran.r-project.org/package=sparcl `nperms` The number of permutations. Default is 25. `quiet` Print out progress?

## Details

Each data matrix should have each row reflect a different observation and each column a different gene.

Let O(s) denote the objective function with tuning parameter s. To calculate the Gap statistic, the observations are permuted within each feature. Using the permuted data, sparse K-means is run with tuning parameter s, yielding the objective function O*(s). This is done repeatedly to get a number of O*(s) values. Then, the Gap statistic is given by \$Gap(s)=log(O(s))-mean(log(O*(s)))\$. The optimal s is that which results in the highest Gap statistic. Or, we can choose the smallest s such that its Gap statistic is within \$sd(log(O*(s)))\$ of the largest Gap statistic.

## Value

A list of objects containing tuning parameter selection information. Each object contains the following:

 `gaps` The gap statistic `sdgaps` The standard deviation of log(O*(s)), for each value of the tuning parameter s. See sparcl documentation for more details. `nnonzerows` The number of features with non-zero weights, for each value of the tuning parameter. `wbounds` The tuning parameters considered `bestw` The tuning parameter with highest gap statistic

Abraham Apfel

## References

Witten and Tibshirani (2009) A framework for feature selection in clustering.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24``` ```#Simulate data matrix dat1<-matrix(rnorm(200,-1,1),20,10) dat2<-matrix(rnorm(800,0,1),20,40) C1<-cbind(dat1,dat2) C2<-matrix(rnorm(1000,0,1),20,50) dat3<-matrix(rnorm(200,1,1),20,10) dat4<-matrix(rnorm(800,0,1),20,40) C3<-cbind(dat3,dat4) orig.sample<-rbind(C1,C2,C3) #Take B=4 subsamples sub.sample<-sub.sim(data=orig.sample,N=60,prop=0.5,B=4) #Calculate gap statistic to aid in tuning parameter selection for each subsample tun_par<-tun_calc(data=sub.sample,k=3,wb=NULL,nperms=5,quiet=FALSE) #Create list based on highest gap statistic to be used as wb parameter max.gap<-replicate(n=4,expr=list()) for(i in 1:4){ max.gap[[i]]<-tun_par[[i]]\$bestw } pi1<-c(0.2,0.3,0.4) #Apply Stable Sparse K-means algorithm on subsamples res<-stablecluster(data=sub.sample,k=3,wb=max.gap,nstart=5,maxiter=6,orig=orig.sample,N=60,pi=pi1) ```

sskm documentation built on May 29, 2017, 10:43 p.m.