Description Details Author(s) References See Also Examples
Gaussian mixture copula models (GMCM) are a flexible class of statistical
models which can be used for unsupervised clustering, meta analysis, and
many other things. In meta analysis, GMCMs can be used to
quantify and identify which features which have been reproduced across
multiple experiments. This package provides a fast and general
implementation of GMCM cluster analysis and serves as an improvement and
extension of the features available in the idr
package.
If the meta analysis of Li et al. (2011) is to be performed, the
function fit.meta.GMCM
is used to identify the maximum
likelihood estimate of the special Gaussian mixture copula model (GMCM)
defined by Li et al. (2011). The function get.IDR
computes the local and adjusted Irreproducible Discovery Rates defined
by Li et al. (2011) to determine the level of reproducibility.
Tewari et. al. (2011) proposed using GMCMs as an general unsupervised
clustering tool. If such a general unsupervised clustering is needed, like
above, the function fit.full.GMCM
computes the maximum
likelihood estimate of the general GMCM. The function
get.prob
is used to estimate the class membership
probabilities of each observation.
SimulateGMCMData
provide easy simulation from the GMCMs.
Anders Ellern Bilgrau, Martin Boegsted, Poul Svante Eriksen
Maintainer: Anders Ellern Bilgrau <anders.ellern.bilgrau@gmail.com>
Anders Ellern Bilgrau, Poul Svante Eriksen, Jakob Gulddahl Rasmussen, Hans Erik Johnsen, Karen Dybkaer, Martin Boegsted (2016). GMCM: Unsupervised Clustering and Meta-Analysis Using Gaussian Mixture Copula Models. Journal of Statistical Software, 70(2), 1-23. doi:10.18637/jss.v070.i02
Li, Q., Brown, J. B. J. B., Huang, H., & Bickel, P. J. (2011). Measuring reproducibility of high-throughput experiments. The Annals of Applied Statistics, 5(3), 1752-1779. doi:10.1214/11-AOAS466
Tewari, A., Giering, M. J., & Raghunathan, A. (2011). Parametric Characterization of Multimodal Distributions with Non-gaussian Modes. 2011 IEEE 11th International Conference on Data Mining Workshops, 286-292. doi:10.1109/ICDMW.2011.135
Core user functions: fit.meta.GMCM
,
fit.full.GMCM
, get.IDR
,
get.prob
, SimulateGMCMData
,
SimulateGMMData
, rtheta
,
Uhat
, choose.theta
,
full2meta
, meta2full
Package by Li et. al. (2011): idr
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | # Loading data
data(u133VsExon)
# Subsetting data to reduce computation time
u133VsExon <- u133VsExon[1:5000, ]
# Ranking and scaling,
# Remember large values should be critical to the null!
uhat <- Uhat(1 - u133VsExon)
# Visualizing P-values and the ranked and scaled P-values
## Not run:
par(mfrow = c(1,2))
plot(u133VsExon, cex = 0.5, pch = 4, col = "tomato", main = "P-values",
xlab = "P (U133)", ylab = "P (Exon)")
plot(uhat, cex = 0.5, pch = 4, col = "tomato", main = "Ranked P-values",
xlab = "rank(1-P) (U133)", ylab = "rank(1-P) (Exon)")
## End(Not run)
# Fitting using BFGS
fit <- fit.meta.GMCM(uhat, init.par = c(0.5, 1, 1, 0.5), pgtol = 1e-2,
method = "L-BFGS", positive.rho = TRUE, verbose = TRUE)
# Compute IDR values and classify
idr <- get.IDR(uhat, par = fit)
table(idr$K) # 1 = irreproducible, 2 = reproducible
## Not run:
# See clustering results
par(mfrow = c(1,2))
plot(u133VsExon, cex = 0.5, pch = 4, main = "Classified genes",
col = c("tomato", "steelblue")[idr$K],
xlab = "P-value (U133)", ylab = "P-value (Exon)")
plot(uhat, cex = 0.5, pch = 4, main = "Classified genes",
col = c("tomato", "steelblue")[idr$K],
xlab = "rank(1-P) (U133)", ylab = "rank(1-P) (Exon)")
## End(Not run)
|
iter 10 value -1213.470575
iter 20 value -1213.524390
final value -1213.524559
converged
1 2
4136 864
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.