Description Usage Arguments Details Value Note Author(s) References See Also Examples
Gene selection based on the marginal distributions of gene profiles that characterized by a mixture of three-component multivariate distributions. Input is a data matrix. The user needs to provide initial gene cluster membership.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | gsMMD2.default(X,
memSubjects,
memIni,
maxFlag = TRUE,
thrshPostProb = 0.5,
geneNames = NULL,
alpha = 0.05,
transformFlag = FALSE,
transformMethod = "boxcox",
scaleFlag = TRUE,
criterion = c("cor", "skewness", "kurtosis"),
minL = -10,
maxL = 10,
stepL = 0.1,
eps = 0.001,
ITMAX = 100,
plotFlag = FALSE,
quiet=TRUE)
|
X |
a data matrix. The rows of the matrix are genes. The columns of the matrix are subjects. |
memSubjects |
a vector of membership of subjects. |
memIni |
a vector of user-provided gene cluster membership. |
maxFlag |
logical. Indicate how to assign gene class membership. |
thrshPostProb |
threshold for posterior probabilities. For example, if the posterior probability that a gene belongs to cluster 1 given its gene expression levels is larger than |
geneNames |
an optional character vector of gene names |
alpha |
significant level which is equal to |
transformFlag |
logical. Indicate if data transformation is needed |
transformMethod |
method for transforming data. Available methods include "boxcox", "log2", "log10", "log", "none". |
scaleFlag |
logical. Indicate if gene profiles are to be scaled. If |
criterion |
if |
minL |
lower limit for the |
maxL |
upper limit for the |
stepL |
tolerance when searching the optimal |
eps |
a small positive value. If the absolute value of a value is smaller than |
ITMAX |
maximum iteration allowed for iterations in the EM algorithm |
plotFlag |
logical. Indicate if the Box-Cox normality plot should be output. |
quiet |
logical. Indicate if intermediate results should be printed out. |
We assume that the distribution of gene expression profiles is a mixture of 3-component multivariate normal distributions ∑_{k=1}^{3} π_k f_k(x|θ). Each component distribution f_k corresponds to a gene cluster. The 3 components correspond to 3 gene clusters: (1) up-regulated gene cluster, (2) non-differentially expressed gene cluster, and (3) down-regulated gene cluster. The model parameter vector is θ=(π_1, π_2, π_3, μ_{c1}, σ^2_{c1}, ρ_{c1}, μ_{n1}, σ^2_{n1}, ρ_{n1}, μ_2, σ^2_2, ρ_2, μ_{c3}, σ^2_{c3}, ρ_{c3}, μ_{n3}, σ^2_{n3}, ρ_{n3}. where π_1, π_2, and π_3 are the mixing proportions; μ_{c1}, σ^2_{c1}, and ρ_{c1} are the marginal mean, variance, and correlation of gene expression levels of cluster 1 (up-regulated genes) for diseased subjects; μ_{n1}, σ^2_{n1}, and ρ_{n1} are the marginal mean, variance, and correlation of gene expression levels of cluster 1 (up-regulated genes) for non-diseased subjects; μ_2, σ^2_2, and ρ_2 are the marginal mean, variance, and correlation of gene expression levels of cluster 2 (non-differentially expressed genes); μ_{c3}, σ^2_{c3}, and ρ_{c3} are the marginal mean, variance, and correlation of gene expression levels of cluster 3 (up-regulated genes) for diseased subjects; μ_{n3}, σ^2_{n3}, and ρ_{n3} are the marginal mean, variance, and correlation of gene expression levels of cluster 3 (up-regulated genes) for non-diseased subjects.
Note that genes in cluster 2 are non-differentially expressed across abnormal and normal tissue samples. Hence there are only 3 parameters for cluster 2.
To make sure the identifiability, we set the following contraints: μ_{c1}>μ_{n1} and μ_{c3}<μ_{n3}.
To make sure the marginal covariance matrices are poisitive definite, we set the following contraints: -1/(n_c-1)<ρ_{c1}<1, -1/(n_n-1)<ρ_{n1}<1, -1/(n-1)<ρ_{2}<1, -1/(n_c-1)<ρ_{c3}<1, -1/(n_n-1)<ρ_{n3}<1.
We also has the following constraints for the mixing proportion: π_3=1-π_1-π_2, π_k>0, k=1,2,3.
We apply the EM algorithm to estimate the model parameters. We regard the cluster membership of genes as missing values.
To facilitate the estimation of the parameters, we reparametrize the parameter vector as θ^*=(π_1, π_2, μ_{c1}, σ^2_{c1}, r_{c1}, δ_{n1}, σ^2_{n1}, r_{n1}, μ_2, σ^2_2, r_2, μ_{c3}, σ^2_{c3}, r_{c3}, δ_{n3}, σ^2_{n3}, r_{n3}), where μ_{n1}=μ_{c1}-\exp(δ_{n1}), μ_{n3}=μ_{c3}+\exp(δ_{n3}), ρ_{c1}=(\exp(r_{c1})-1/(n_c-1))/(1+\exp(r_{c1})), ρ_{n1}=(\exp(r_{n1})-1/(n_n-1))/(1+\exp(r_{n1})), ρ_{2}=(\exp(r_{2})-1/(n-1))/(1+\exp(r_{2})), ρ_{c3}=(\exp(r_{c3})-1/(n_c-1))/(1+\exp(r_{c3})), ρ_{n3}=(\exp(r_{n3})-1/(n_n-1))/(1+\exp(r_{n3})).
Given a gene, the expression levels of the gene are assumed independent. However, after scaling, the scaled expression levels of the gene are no longer independent and the rank r^*=r-1 of the covariance matrix for the scaled gene profile will be one less than the rank r for the un-scaled gene profile Hence the covariance matrix of the gene profile will no longer be positive-definite. To avoid this problem, we delete a tissue sample after scaling since its information has been incorrporated by other scaled tissue samples. We arbitrarily select the tissue sample, which has the biggest label number, from the tissue sample group that has larger size than the other tissue sample group. For example, if there are 6 cancer tissue samples and 10 normal tissue samples, we delete the 10-th normal tissue sample after scaling.
A list contains 13 elements.
dat |
the (transformed) microarray data matrix. If tranformation
performed, then |
memSubjects |
the same as the input |
memGenes |
a vector of cluster membership of genes. 1 means up-regulated gene; 2 means non-differentially expressed gene; 3 means down-regulated gene. |
memGenes2 |
an variant of the vector of cluster membership of genes. 1 means differentially expressed gene; 0 means non-differentially expressed gene. |
para |
parameter estimates (c.f. details). |
llkh |
value of the loglikelihood function. |
wiMat |
posterior probability that a gene belongs to a cluster given the expression levels of this gene. Column i is for cluster i. |
memIni |
the initial cluster membership of genes. |
paraIni |
the parameter estimates based on initial gene cluster membership. |
llkhIni |
the value of loglikelihood function. |
lambda |
the parameter used to do Box-Cox transformation |
paraRP |
parameter estimates for reparametrized parameter vector (c.f. details). |
paraIniRP |
the parameter estimates for reparametrized parameter vector based on initial gene cluster membership. |
The speed of the program can be slow for large data sets, however it has been improved using Fortran code.
Jarrett Morrow remdj@channing.harvard.edu, Weiliang Qiu Weiliang.Qiu@gmail.com, Wenqing He whe@stats.uwo.ca, Xiaogang Wang stevenw@mathstat.yorku.ca, Ross Lazarus ross.lazarus@channing.harvard.edu
Qiu, W.-L., He, W., Wang, X.-G. and Lazarus, R. (2008). A Marginal Mixture Model for Selecting Differentially Expressed Genes across Two Types of Tissue Samples. The International Journal of Biostatistics. 4(1):Article 20. http://www.bepress.com/ijb/vol4/iss1/20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | ## Not run:
library(ALL)
data(ALL)
eSet1 <- ALL[1:100, ALL$BT == "B3" | ALL$BT == "T2"]
mat <- exprs(eSet1)
mem.str <- as.character(eSet1$BT)
nSubjects <- length(mem.str)
memSubjects <- rep(0, nSubjects)
# B3 coded as 0, T2 coded as 1
memSubjects[mem.str == "T2"] <- 1
myWilcox <-
function(x, memSubjects, alpha = 0.05)
{
xc <- x[memSubjects == 1]
xn <- x[memSubjects == 0]
m <- sum(memSubjects == 1)
res <- wilcox.test(x = xc, y = xn, conf.level = 1 - alpha)
res2 <- c(res$p.value, res$statistic - m * (m + 1) / 2)
names(res2) <- c("p.value", "statistic")
return(res2)
}
tmp <- t(apply(mat, 1, myWilcox, memSubjects = memSubjects))
colnames(tmp) <- c("p.value", "statistic")
memIni <- rep(2, nrow(mat))
memIni[tmp[, 1] < 0.05 & tmp[, 2] > 0] <- 1
memIni[tmp[, 1] < 0.05 & tmp[,2] < 0] <- 3
cat("initial gene cluster size>>\n"); print(table(memIni)); cat("\n");
obj.gsMMD <- gsMMD2.default(mat, memSubjects, memIni = memIni,
transformFlag = TRUE, transformMethod = "boxcox", scaleFlag = TRUE)
round(obj.gsMMD$para, 3)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.