Description Usage Arguments Details Value Author(s) Examples
Implementation for Zhiguang Huo, Ying Ding, Shuchang Liu, Steffi Oesterreic and George Tseng. A Sparse K-means Meta-analysis framework combining multiple transcriptomic studies for disease subtype discovery. Journal of the American Statistical Association. 111, no. 513 (2016): 27-42.
1 2 3 |
x |
A list for several microarray studies. Each element of the list should be a p*n matrix. p is number of features and n is number of samples. Clustering is performed on sample level. p has to be the same for each study. Missing value should be set to be NA. Current version won't support missing value, it will be allowed in the next ##' version. |
K |
K specifies number of clusters. We assume the number of clusters to be the same in each study. |
wbounds |
wbounds is the tuning parameter that controls number of selected features. Larger tuning parameter yield more selected features. wbounds could be a number or a vector. wbounds is suggested to be selected using prior information (e.g. which tuning parameter generate best survival difference.) |
nstart |
In the MetaSparseKmeans algorithm, there are multiple places in which we will use Kmeans and weighted Kmeans. nstart specify the number of starting point for each of these Kmeans and weighted Kmeans. |
ntrial |
Since for high dimensional data, it is likely to stuck into a local minimum. ntrial specifies how many times we would repeat the algorithm. The result with the best objective score will be used. |
maxiter |
The algorithm iteratively update ws (feature weight), cs (clulster assignment) and matching. maxiter specifies the max number of iteration for MetaSparseKmeans |
lambda |
A tuning parameter controlling the balance between separation ability (BCSS/TSS) and matching function. lambda is set to be 1/2 by default. |
sampleSizeAdjust |
logical argument, controlling whether to adjust for sample size. If true, that means study with larger sample size will have a larger impact. If false, each study has equal contribution. Without prior information, sampleSizeAdjust=FALSE is suggested since we are not sure about data quality. |
wsPre |
If there is prior knowledge which genes are important, we could specify the initialization of the gene weight. |
silence |
Logical parameter whether we should print out details. |
Here is the instruction about the input
The returning result could be a list or a list vector, depending whether the input wbounds is a number or a vector.
ws |
Weight for a feature |
Cs |
Resulting clustering assignment |
wbound |
the used tuning parameter |
score |
objective score, the larger the better |
Zhiguang Huo
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 | ######################################
## use browseVignettes('MetaSparseKmeans') to see a comprehensive explanation.
######################################
## generate data
set.seed(15213)
G = 1000
n11 = 100
n12 = 100
n13 = 150
label1 = c(rep(1,n11),rep(2,n12),rep(3,n13))
P0 = 0.6
P1 = 0.1
P2 = 0.1
P3 = 0.1
P4 = 0.1
sd = 0.5
G0 = G*P0 # nonDE genes
G1 = G*P1 # DE H-L
G2 = G*P2 # DE L-H
G3 = G*P3
G4 = G*P4
mu111 = runif(G1,-0.25,0.25)
mu112 = runif(G1,0.5,1)
mu113 = runif(G1,-1,-0.5)
mu121 = runif(G2,-1,-0.5)
mu122 = runif(G2,-0.25,0.25)
mu123 = runif(G2,0.5,1)
mu131 = runif(G3,-1,-0.5)
mu132 = runif(G3,-0.25,0.25)
mu133 = runif(G3,0.5,1)
mu14 = runif(G4,-0.25,0.25)
mu10 = runif(G0,-0.25,0.25)
Data111 = matrix(rnorm(n11*G1,mu111,sd^2),nrow=G1)
Data112 = matrix(rnorm(n12*G1,mu112,sd^2),nrow=G1)
Data113 = matrix(rnorm(n13*G1,mu113,sd^2),nrow=G1)
Data11 = cbind(Data111,Data112,Data113)
Data121 = matrix(rnorm(n11*G2,mu121,sd^2),nrow=G2)
Data122 = matrix(rnorm(n12*G2,mu122,sd^2),nrow=G2)
Data123 = matrix(rnorm(n13*G2,mu123,sd^2),nrow=G2)
Data12 = cbind(Data121,Data122,Data123)
Data131 = matrix(rnorm(n11*G3,mu131,sd^2),nrow=G3)
Data132 = matrix(rnorm(n12*G3,mu132,sd^2),nrow=G3)
Data133 = matrix(rnorm(n13*G3,mu133,sd^2),nrow=G3)
Data13 = cbind(Data131,Data132,Data133)
Data14 = matrix(rnorm((n11+n12+n13)*G4,mu14,sd^2),nrow=G4)
Data10 = matrix(rnorm((n11+n12+n13)*G0,mu10,sd^2),nrow=G0)
S1 = rbind(Data10,Data11,Data12,Data13,Data14)
G = 1000
n21 = 150
n22 = 100
n23 = 100
label2 = c(rep(1,n21),rep(2,n22),rep(3,n23))
P0 = 0.6
P1 = 0.1 #common features
P2 = 0.1 #common features
P3 = 0.1 #noise in S1
P4 = 0.1 #noise in S2
sd = 0.5
G0 = G*P0 # nonDE genes
G1 = G*P1 # DE H-L
G2 = G*P2 # DE L-H
G3 = G*P3 #noise in S1
G4 = G*P4 #noise in S2
mu211 = runif(G1,-0.25,0.25)
mu212 = runif(G1,0.5,1)
mu213 = runif(G1,-1,-0.5)
mu221 = runif(G2,-1,-0.5)
mu222 = runif(G2,-0.25,0.25)
mu223 = runif(G2,0.5,1)
mu23 = runif(G3,-0.25,0.25)
mu241 = runif(G4,-1,-0.5)
mu242 = runif(G4,-0.25,0.25)
mu243 = runif(G4,0.5,1)
mu20 = runif(G0,-0.25,0.25)
Data211 = matrix(rnorm(n21*G1,mu211,sd^2),nrow=G1)
Data212 = matrix(rnorm(n22*G1,mu212,sd^2),nrow=G1)
Data213 = matrix(rnorm(n23*G1,mu213,sd^2),nrow=G1)
Data21 = cbind(Data211,Data212,Data213)
Data221 = matrix(rnorm(n21*G2,mu221,sd^2),nrow=G2)
Data222 = matrix(rnorm(n22*G2,mu222,sd^2),nrow=G2)
Data223 = matrix(rnorm(n23*G2,mu223,sd^2),nrow=G2)
Data22 = cbind(Data221,Data222,Data223)
Data23 = matrix(rnorm((n21+n22+n23)*G3,mu23,sd^2),nrow=G3)
Data241 = matrix(rnorm(n21*G4,mu241,sd^2),nrow=G4)
Data242 = matrix(rnorm(n22*G4,mu242,sd^2),nrow=G4)
Data243 = matrix(rnorm(n23*G4,mu243,sd^2),nrow=G4)
Data24 = cbind(Data241,Data242,Data243)
Data20 = matrix(rnorm((n21+n22+n23)*G0,mu20,sd^2),nrow=G0)
S2 = rbind(Data20,Data21,Data22,Data23,Data24)
## visualize the data
S = list(t(S1),t(S2))
getWsHeatmap(t(S[[1]]),label1,main='two study before
metaSparseKMeans, S1')
getWsHeatmap(t(S[[2]]),label2,main='two study before metaSparseKMeans, S2')
## perform meta sparse Kmeans
res = MetaSparseKmeans(x=S,K=3,wbounds=10,lambda=0.5)
## visualize the result
getWsHeatmap(t(S[[1]]),res$Cs[[1]],res$ws,main='two study after metaSparseKMeans, S1')
getWsHeatmap(t(S[[2]]),res$Cs[[2]],res$ws,main='two study after metaSparseKMeans, S2')
plot(res$ws,main='metaSparseKmeans weight dist',xlab='geneIndex')
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.