In this module, we show how testing for multiple hypotheses (genes) can increase the chance of false positives, especially for small sample sizes.
knitr::opts_chunk$set(message=FALSE, warning=FALSE, eval=TRUE) library(ComplexHeatmap)
library(ComplexHeatmap)
Nrow <- 10000 Ncol <- 200
Here we show the scenario presented in class (slide "Gene markers
selection: better than chance?"). In the examples below, we show
heatmaps corresponding to random noise, and we show that, if enough
hypotheses are tested (in this case, r Nrow
), and the sample size is
sufficiently small (e.g., n=6), we can easily identify 'genes' whose
expression pattern seems to be strongly associated with the
phenotype (in this case, a random head/tail), as suggested by the
heatmap with a clear blue-to-red pattern. As the sample size increases
(e.g., n=r Ncol
), it is more difficult to be 'fooled', as the
corresponding heatmap shows a less clear blue-to-red pattern.
We start by generating a large [r Nrow
xr Ncol
] matrix filled with random
values drawn from a Gaussian distribution with mean=0 and stdev=0.5.
set.seed(123) # for reproducible results DAT <- matrix(rnorm(Ncol*Nrow,mean=0,sd=0.5),nrow=Nrow,ncol=Ncol) hist(DAT)
We then pick a small subset of columns from this matrix and randomly
assign them a binary (head
-tail
) phenotype. We then pick the top 25 markers
associated to head
and tail
, and plot the corresponding heatmaps.
heatmap_wrapper <- function(DAT, Ncol, ndraw) { ## randomly select Ncol columns from the full matrix DATi <- DAT[, colDraw <- sample(Ncol, size = ndraw)] ## generate a (head/tail) phenotype of proper size pheno <- factor(rep(c("head", "tail"), each = ndraw / 2)) ## perform t.test on each data row with respect to the random phenotype DIFi <- t(apply(DATi, 1, tscore, x = pheno)) ## pick top 25 markers in each direction topMarkers <- c( order(DIFi[, 1], decreasing = FALSE)[1:25], order(DIFi[, 1], decreasing = TRUE)[1:25] ) ## visualize the corresponding heatmap of 50 markers annot_col <- ComplexHeatmap::HeatmapAnnotation( toss = pheno, col = list(toss = c("head" = "green", "tail" = "orange")) ) print(ComplexHeatmap::Heatmap(DATi[topMarkers, ], name = paste("sample size =", ndraw), col = circlize::colorRamp2(c(-1, 0, 1), c("#072448", "white", "#ff6150")), top_annotation = annot_col, cluster_rows = FALSE, cluster_columns = FALSE, row_title = "", show_column_names = FALSE, show_row_names = FALSE )) ## show the top markers (by p-value) print(head(cbind(DIFi, FDR = p.adjust(DIFi[, 2], method = "BH"))[topMarkers, ])) } ## creating a black-to-red palette for heatmap display ramp.br <- grDevices::colorRamp(c( "blue","white","red")) palette.blue2red <- rgb( ramp.br(seq(0, 1, length = 14)), max = 255) ## wrapper function extracting t-statistic and p-value from a call to function t.test tscore <- function(y,x) { tmp <- t.test(y~x) c(score=tmp$statistic,pval=tmp$p.value) }
heatmap_wrapper( DAT=DAT, Ncol=Ncol, ndraw = 6)
heatmap_wrapper( DAT=DAT, Ncol=Ncol, ndraw = 14)
heatmap_wrapper( DAT=DAT, Ncol=Ncol, ndraw = 30)
heatmap_wrapper( DAT=DAT, Ncol=Ncol, ndraw = 100)
heatmap_wrapper( DAT=DAT, Ncol=Ncol, ndraw = 200)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.