MetaSparseKmeans: Main function to perform MetaSparseKmeans.

Description Usage Arguments Details Value Author(s) Examples

View source: R/MetaSparseKmeans.R

Description

Implementation for Zhiguang Huo, Ying Ding, Shuchang Liu, Steffi Oesterreic and George Tseng. A Sparse K-means Meta-analysis framework combining multiple transcriptomic studies for disease subtype discovery. Journal of the American Statistical Association. 111, no. 513 (2016): 27-42.

Usage

1
2
3
MetaSparseKmeans(x, K = NULL, wbounds = NULL, nstart = 20, ntrial = 1,
  maxiter = 20, lambda = 1/2, sampleSizeAdjust = FALSE, wsPre = NULL,
  silence = FALSE)

Arguments

x

A list for several microarray studies. Each element of the list should be a p*n matrix. p is number of features and n is number of samples. Clustering is performed on sample level. p has to be the same for each study. Missing value should be set to be NA. Current version won't support missing value, it will be allowed in the next ##' version.

K

K specifies number of clusters. We assume the number of clusters to be the same in each study.

wbounds

wbounds is the tuning parameter that controls number of selected features. Larger tuning parameter yield more selected features. wbounds could be a number or a vector. wbounds is suggested to be selected using prior information (e.g. which tuning parameter generate best survival difference.)

nstart

In the MetaSparseKmeans algorithm, there are multiple places in which we will use Kmeans and weighted Kmeans. nstart specify the number of starting point for each of these Kmeans and weighted Kmeans.

ntrial

Since for high dimensional data, it is likely to stuck into a local minimum. ntrial specifies how many times we would repeat the algorithm. The result with the best objective score will be used.

maxiter

The algorithm iteratively update ws (feature weight), cs (clulster assignment) and matching. maxiter specifies the max number of iteration for MetaSparseKmeans

lambda

A tuning parameter controlling the balance between separation ability (BCSS/TSS) and matching function. lambda is set to be 1/2 by default.

sampleSizeAdjust

logical argument, controlling whether to adjust for sample size. If true, that means study with larger sample size will have a larger impact. If false, each study has equal contribution. Without prior information, sampleSizeAdjust=FALSE is suggested since we are not sure about data quality.

wsPre

If there is prior knowledge which genes are important, we could specify the initialization of the gene weight.

silence

Logical parameter whether we should print out details.

Details

Here is the instruction about the input

Value

The returning result could be a list or a list vector, depending whether the input wbounds is a number or a vector.

ws

Weight for a feature

Cs

Resulting clustering assignment

wbound

the used tuning parameter

score

objective score, the larger the better

Author(s)

Zhiguang Huo

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
######################################
## use browseVignettes('MetaSparseKmeans') to see a comprehensive explanation.
######################################
## generate data
set.seed(15213)

G = 1000
n11 = 100
n12 = 100
n13 = 150
label1 = c(rep(1,n11),rep(2,n12),rep(3,n13))

P0 = 0.6
P1 = 0.1
P2 = 0.1
P3 = 0.1
P4 = 0.1
sd = 0.5

G0 = G*P0  # nonDE genes
G1 = G*P1  # DE H-L
G2 = G*P2  # DE L-H
G3 = G*P3
G4 = G*P4


mu111 = runif(G1,-0.25,0.25)
mu112 = runif(G1,0.5,1)
mu113 = runif(G1,-1,-0.5)

mu121 = runif(G2,-1,-0.5)
mu122 = runif(G2,-0.25,0.25)
mu123 = runif(G2,0.5,1)

mu131 = runif(G3,-1,-0.5)
mu132 = runif(G3,-0.25,0.25)
mu133 = runif(G3,0.5,1)

mu14 = runif(G4,-0.25,0.25)
mu10 = runif(G0,-0.25,0.25)

Data111 = matrix(rnorm(n11*G1,mu111,sd^2),nrow=G1)
Data112 = matrix(rnorm(n12*G1,mu112,sd^2),nrow=G1)
Data113 = matrix(rnorm(n13*G1,mu113,sd^2),nrow=G1)
Data11 = cbind(Data111,Data112,Data113)

Data121 = matrix(rnorm(n11*G2,mu121,sd^2),nrow=G2)
Data122 = matrix(rnorm(n12*G2,mu122,sd^2),nrow=G2)
Data123 = matrix(rnorm(n13*G2,mu123,sd^2),nrow=G2)
Data12 = cbind(Data121,Data122,Data123)

Data131 = matrix(rnorm(n11*G3,mu131,sd^2),nrow=G3)
Data132 = matrix(rnorm(n12*G3,mu132,sd^2),nrow=G3)
Data133 = matrix(rnorm(n13*G3,mu133,sd^2),nrow=G3)
Data13 = cbind(Data131,Data132,Data133)

Data14 = matrix(rnorm((n11+n12+n13)*G4,mu14,sd^2),nrow=G4)

Data10 = matrix(rnorm((n11+n12+n13)*G0,mu10,sd^2),nrow=G0)

S1 = rbind(Data10,Data11,Data12,Data13,Data14)


G = 1000
n21 = 150
n22 = 100
n23 = 100

label2 = c(rep(1,n21),rep(2,n22),rep(3,n23))

P0 = 0.6
P1 = 0.1 #common features
P2 = 0.1 #common features
P3 = 0.1 #noise in S1
P4 = 0.1 #noise in S2
sd = 0.5

G0 = G*P0  # nonDE genes
G1 = G*P1  # DE H-L
G2 = G*P2  # DE L-H
G3 = G*P3  #noise in S1
G4 = G*P4  #noise in S2

mu211 = runif(G1,-0.25,0.25)
mu212 = runif(G1,0.5,1)
mu213 = runif(G1,-1,-0.5)

mu221 = runif(G2,-1,-0.5)
mu222 = runif(G2,-0.25,0.25)
mu223 = runif(G2,0.5,1)

mu23 = runif(G3,-0.25,0.25)

mu241 = runif(G4,-1,-0.5)
mu242 = runif(G4,-0.25,0.25)
mu243 = runif(G4,0.5,1)

mu20 = runif(G0,-0.25,0.25)

Data211 = matrix(rnorm(n21*G1,mu211,sd^2),nrow=G1)
Data212 = matrix(rnorm(n22*G1,mu212,sd^2),nrow=G1)
Data213 = matrix(rnorm(n23*G1,mu213,sd^2),nrow=G1)
Data21 = cbind(Data211,Data212,Data213)

Data221 = matrix(rnorm(n21*G2,mu221,sd^2),nrow=G2)
Data222 = matrix(rnorm(n22*G2,mu222,sd^2),nrow=G2)
Data223 = matrix(rnorm(n23*G2,mu223,sd^2),nrow=G2)
Data22 = cbind(Data221,Data222,Data223)

Data23 = matrix(rnorm((n21+n22+n23)*G3,mu23,sd^2),nrow=G3)

Data241 = matrix(rnorm(n21*G4,mu241,sd^2),nrow=G4)
Data242 = matrix(rnorm(n22*G4,mu242,sd^2),nrow=G4)
Data243 = matrix(rnorm(n23*G4,mu243,sd^2),nrow=G4)
Data24 = cbind(Data241,Data242,Data243)


Data20 = matrix(rnorm((n21+n22+n23)*G0,mu20,sd^2),nrow=G0)

S2 = rbind(Data20,Data21,Data22,Data23,Data24)


## visualize the data

S = list(t(S1),t(S2))
getWsHeatmap(t(S[[1]]),label1,main='two study before
metaSparseKMeans, S1')

getWsHeatmap(t(S[[2]]),label2,main='two study before metaSparseKMeans, S2')


## perform meta sparse Kmeans

res = MetaSparseKmeans(x=S,K=3,wbounds=10,lambda=0.5)

## visualize the result


getWsHeatmap(t(S[[1]]),res$Cs[[1]],res$ws,main='two study after metaSparseKMeans, S1')

getWsHeatmap(t(S[[2]]),res$Cs[[2]],res$ws,main='two study after metaSparseKMeans, S2')

plot(res$ws,main='metaSparseKmeans weight dist',xlab='geneIndex')

Caleb-Huo/MetaSparseKmeans documentation built on April 7, 2018, 3:09 p.m.