Meta analysis with special GMCMs"

library("knitr")
set.seed(5828993)
opts_knit$set(self.contained = TRUE)

This is a quick tutorial for using the special GMCM for meta analysis.

Initialization

The GMCM^1^ package is loaded.

#install.packages("GMCM")  # Uncomment to install the GMCM package
library("GMCM")

If GMCM is not installed, please uncomment the above line and rerun the script.

Loading data

The data is loaded and the first rows are printed. To illustrate we load the u133VsExon dataset within the package. The dataset contains 19,577 p-values for the null hypothesis of no differential gene expression between two cell types for each of two different experiments called u133 and exon.

x <- get(data("u133VsExon"))
head(x, n = 4)

See help("u133VsExon") for more information.

Next, we subset the data to simply reduce computation time. After that, we will rank it, and visualize it.

x <- x[1:5000, ]

The values above are p-values where small values indicate strong evidence---contrary to what is expected by the special model. In the special model, large values should be critical to the null hypothesis.

Therefore we ranks and scale 1 - p:

u <- Uhat(1 - x)

The original and ranked data is plotted:

par(mfrow = c(1,2))  # Visualizing P-values and the ranked P-values
plot(x, cex = 0.5, pch = 4, col = "tomato", main = "P-values",
     xlab = "P-value (U133)", ylab = "P-value (Exon)")
plot(u, cex = 0.5, pch = 4, col = "tomato", main = "Ranked P-values",
     xlab = "rank(1-P) (U133)", ylab = "rank(1-P) (Exon)")

Here each point represent a gene. The genes in the lower left of the first panel and correspondingly in the upper right of the second panel are the seemingly reproducible genes. They have a low p-value and thus a high rank in both experiments. The genes in the upper left and lower right are the ones that are apparently spurious results; they have a low p-value in only one experiment.

Initial parameters

The initial parameters are set to

init_par <- c(pie1 = 0.6, mu = 1, sigma = 1, rho = 0.2)

Model fitting

With the data loaded and prepared, the model is can be fitted with the defined initial parameters.

par <- fit.meta.GMCM(u = u,
                     init.par = init_par,
                     method = "NM",
                     max.ite = 1000,
                     verbose = FALSE)
print(par)

We here use the Nelder-Mead fitting method with with a maximum number of iterations of 1000.

Meta analysis with unsupervised clustering

The estimated parameters are used to calculate the local and adjusted irreproducibility discovery rates:

meta_IDR_thres <- 0.05
out <- get.IDR(x, par = par,
               threshold = meta_IDR_thres) # Compute IDR
str(out)

out <- get.IDR(u, par = par, threshold = meta_IDR_thres)
below <- out[["IDR"]] < meta_IDR_thres
out$l <- sum(below)
out$Khat <- ifelse(below, 2, 1)

By default, get.IDR computes thresholds on the adjusted IDR. The local irreproducibility discovery rate correspond to the posterior probability of the point originating from the irreproducible component.

Results

The classes are counted by

table(out$Khat)

Where we see how many of the 5000 genes are deemed reproducible and irreproducible.

The results are also displayed by plotting

plot(x, col = out$Khat, asp = 1) # Plot of raw values
plot(u, col = out$Khat, asp = 1) # Plot of copula values
z <- GMCM:::qgmm.marginal(u, theta = meta2full(par, d = ncol(u))) # Estimate latent process
plot(z, col = out$Khat, asp = 1) # Plot of estimated latent process

The model indeed capture genes in the upper right and classify them as reproducible.

The fitted par object can be converted to a theta object which can be plotted directly:

plot(meta2full(par, d = ncol(u)))

Session information

This report was generated using rmarkdown^2^ and knitr^3^ under the session given below. The report utilizes parameterized reports and knitr::spin.

sessionInfo()

References

Please cite the GMCM paper^1^ if you use the package or shiny app.

cites <- lapply(c("GMCM", "knitr", "rmarkdown"), citation)
fmt_cites <- unlist(lapply(cites, format, style = "text"))
cat(paste0("  ", seq_along(fmt_cites), ". ", fmt_cites, "\n"))


Try the GMCM package in your browser

Any scripts or data that you put into this service are public.

GMCM documentation built on Nov. 6, 2019, 1:08 a.m.