# sub.sim: Take Subsamples of Data In sskm: Stable Sparse K-Means

## Description

Function to take subsamples of data to be used to apply stable sparse k-means algorithm.

## Usage

 `1` ```sub.sim(data, N, prop = 0.5, B = 100) ```

## Arguments

 `data` The original data matrix to take subsamples of. `N` The original number of samples in your data. `prop` The proportion of the original number of samples you wish each subsample to have. Must be a number greater than 0 and less than 1. The default is prop=0.5. `B` The number of subsamples you to make. The default is B=100.

## Details

The data matrix should have each row be a new observation and each column a different gene.

## Value

A list of B m*p matrices containing subsamples of original data where m = prop*N and p = the number of columns in original matrix. Each row reflects a randomly selected observation from original data. Each column is a feature from original dataset.

Abraham Apfel

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24``` ```#Simulate data matrix dat1<-matrix(rnorm(200,-1,1),20,10) dat2<-matrix(rnorm(800,0,1),20,40) C1<-cbind(dat1,dat2) C2<-matrix(rnorm(1000,0,1),20,50) dat3<-matrix(rnorm(200,1,1),20,10) dat4<-matrix(rnorm(800,0,1),20,40) C3<-cbind(dat3,dat4) orig.sample<-rbind(C1,C2,C3) #Take B=4 subsamples sub.sample<-sub.sim(data=orig.sample,N=60,prop=0.5,B=4) #Calculate gap statistic to aid in tuning parameter selection for each subsample tun_par<-tun_calc(data=sub.sample,k=3,wb=NULL,nperms=5,quiet=FALSE) #Create list based on highest gap statistic to be used as wb parameter max.gap<-replicate(n=4,expr=list()) for(i in 1:4){ max.gap[[i]]<-tun_par[[i]]\$bestw } pi1<-c(0.2,0.3,0.4) #Apply Stable Sparse K-means algorithm on subsamples res<-stablecluster(data=sub.sample,k=3,wb=max.gap,nstart=5,maxiter=6,orig=orig.sample,N=60,pi=pi1) ```

sskm documentation built on May 29, 2017, 10:43 p.m.