class: fullscreen, inverse, top, center, text-white background-image: url("../inst/images/many-black-chair-backs.jpg")

.font150[Multiplicity]

knitr::opts_chunk$set(fig.width=4.25, fig.height=3.5, fig.retina=3,
                      message=FALSE, warning=FALSE, cache = TRUE,
                      autodep = TRUE, hiline=TRUE)
knitr::opts_hooks$set(fig.callout = function(options) {
  if (options$fig.callout) {
    options$echo <- FALSE
    options$out.height <- "99%"
    options$fig.width <- 16
    options$fig.height <- 8
  }
  options
})
hook_source <- knitr::knit_hooks$get('source')
knitr::knit_hooks$set(source = function(x, options) {
  if (!is.null(options$hiline) && options$hiline) {
    x <- stringr::str_replace(x, "^ ?(.+)\\s?#<<", "*\\1")
  }
  hook_source(x, options)
})
options(htmltools.dir.version = FALSE, width = 90)
as_table <- function(...) knitr::kable(..., format='html', digits = 3)

library(tidyverse)

Overview


Traditional vs OMICS hypothesis testing

Testing hypothesis for a single observation

OMICS experiments


layout: false

Weight loss example

Experiment: - 250 subjects chosen "randomly". - Diet for 1 week. - Repeated Measurement (Data in kg.): - Weight at the start of the week - Weight at the end of week.

Average weight loss is $0.13$kg.
Paired t-test for weight loss gives a
t-statistic of $t=0.126/0.068=1.84$,
giving a p-value of $0.067$ (using a two-sided test).
Not significant at the $5\%$ level!

.img-right[ ]

.footnote[Example from Andersen (1990) ]


Weight loss example

.left-code[

2*(1 - pt(1.84, df = 250 - 1))
# Asymptotic test
2*(1 - pnorm(1.84, 0,1)) #<<
# one sided tests
(1 - pt(1.84, df = 250 - 1)) #<<

]

.right-plot[ ]

.footnote[ Assymptotic test - does not "help" (and is biased); smaller sample size larger bias.
]


layout: false

Weight loss example

You use a one-tailed test to improve the test's ability to learn whether the new diet is better.

Why is the 1-sided test not acceptable?

You risk missing valuable information by testing in only one direction. Test cannot determine whether diet leads to weight gain.


Weight loss example

Can anything be done to get a significant result
out of this study?
--

.img-right[ ]

--


Weight loss example

What is the problem of this approach?

--

Requests for subgroup analysis common for clinical studies.


exclude: true


Where does multiplicity arise


Where does multiplicity arise

.footnote[Example : 50 Control samples, 50 Treatment, no significant result. Split data into female and male, and young and old group. Four tests, instead of one and maybe one is significant.]


Types of error when testing hypothesis

A type I error (false positive) occurs when
the null hypothesis (H0) is true, but is rejected.
The type I error rate or significance level (p-Value)
is the probability of rejecting the
null hypothesis given that it is true.

A type II error (false negative) occurs when
the null hypothesis is false,
but erroneously fails to be rejected.
The the type II error rate is denoted by the Greek letter $\beta$
and is related to the power of a test (which equals $1−\beta$).

For a given test, the only way to reduce both error rates
is to increase the sample size, and this may not be feasible.

.img-right[ ]


Multiple Testing

.left-code[

# improved statistic
set.seed(2); N = 20; N_obs = 4; mu = 0; sigma = 1
bb <- function(y){
  x <- rnorm( N_obs, mu, sigma )
  # compute mean and sd
  data.frame(mean = mean( x ), 
             sd = sd(x))  }
res <- purrr::map_df(1:N, bb)
T0 <- (mu - res$mean)/ (sigma/sqrt(N_obs)) #<<
plot(T0,rep(0.4,20 ), type="h", xlim=c(-3,3), ylim=c(0,0.4))

x <- seq(-10,10,0.01)
lines(x, dnorm(x, 
      sd= 1), #<<
      col=2)
abline(v = c(qnorm(0.025), qnorm(0.975)),
       col= "orange", lwd=2)

$T|H_0 \sim N(0,1)$ ]

.right-plot[


]


Family-wise error rate (FWER)

In statistics, family-wise error rate (FWER) is the probability of making one or more false discoveries, or type I errors when performing multiple hypotheses tests.

If multiple hypotheses are tested, the chance of a rare event increases, and therefore, the likelihood of incorrectly rejecting a null hypothesis (making a type I error) increases.


P-value adjustment - Bonferroni correction

The Bonferroni correction compensates for that increase by testing each individual hypothesis at a significance level of $\epsilon = \alpha /k$, $\alpha$ is the desired overall size of test and $k$ is the number of hypotheses.

$$k=20;~ \alpha = 0.05;~~~~~~~\epsilon = 0.05/20 = 0.0025$$

Bonferroni adjustments are typically very conservative (assumes that the tests are independent - however tests are frequently correlated) and more complex methods are usually used (e.g. multcomp).

.footnote[wikipedia, Medical Statistics - Sheffield University 2018]


P-value adjustment - Bonferroni correction

Family Wise Error Rate (FWER) - control the probability of at least one Type I error.

$$ \begin{align} Pr(\textrm{at least one Type I error}|H_0) = \epsilon &= 1 - Pr(\textrm{no rejections}|H_0)\ &= 1- \prod^k Pr(p_i > \alpha)\ &= 1- \prod^k (1-\alpha)\ &= 1-(1-\alpha)^k \end{align} $$ Solving for $\alpha$ gives $$\epsilon \approx \alpha/k$$ or exact $$\epsilon = 1 - \exp(1/k\log(1-\alpha))$$.

.footnote[http://genomicsclass.github.io/book/pages/multiple_testing.html]


exclude: true

FWER control using package multcomp

.left-code[ Model with NO interactions.

#help(swiss) #<<
lmod <- lm(Fertility ~ ., #<<
           data = swiss)
lmtable <- broom::tidy(lmod)
K <- diag(length(coef(lmod)))[-1,] #<<
rownames(K) <- names(coef(lmod))[-1] #<<
adjusted <- broom::tidy(summary( 
  multcomp::glht(lmod, linfct = K))) #<<
comp <- inner_join(lmtable[-1,-c(2,3,4)],
                   adjusted[-c(2,3,4,5)],
                   by = c("term" = "contrast"))
colnames(comp)[c(2,3)] <-
  c("p.value","p.adjusted")

]

.pull-right[

knitr::kable(comp, format = "html", caption = "compare p.value and adjusted p.value")

]

.footnote[multcomp: Simultaneous Inference in General Parametric Models by Torsten Hothorn,
Mathematics and theory complex, uses asymptotic properties for to make it tractable. ]


FWER - Conclusion

Controlling the FWER, will demand a unrealistically small p-value.


exclude: true


False Discovery Rate - Motivation

.footnote[Simulating p-values Figure A) 1000 p-values where H0 true, B) 600 p-values where H0 true and 400 HA true. C) closup]

.left-code[

m <- 1000
simulate.p.values <- function(
  i, delta = 2, fraction = 0.1){
  control <- rnorm(6,0,1)
  treatment <- rnorm(6,0,1)
  if (runif(1) < fraction)
   treatment <- treatment + delta
  return(t.test(treatment,control)$p.value)
}
pvals00 <- sapply(1:m, simulate.p.values,
                  delta = 0,fraction = 0 )
pvals24 <- sapply(1:m, simulate.p.values,
                  delta = 2, fraction = 0.4 )
par(mfrow=c(1,3))
hist(pvals00, breaks=20, ylim=c(0,300), main="A")
abline(h=1000/20, col=3)
abline(v = 0.05, col=2)
text(x = c(0.02,0.02,0.2,0.2),y=c(20,100,20,100), labels=c("FP","TP","TN","FN"))
hist(pvals24, breaks=20, ylim=c(0,300), main="B")
abline(h=(1000-400)/20, col=3)
abline(v = 0.05, col=2)
text(x = c(0.02, 0.02, 0.2, 0.2),y=c(20,100,20,100), labels=c("FP","TP","TN","FN"))

hist(pvals24, breaks=20, ylim=c(0,300), xlim=c(0,0.2), main="C")
abline(h=(1000-400)/20, col=3)
abline(v = 0.05, col=2)
text(x = c(0.02,0.02,0.1,0.1),y=c(20,100,20,100), labels=c("FP","TP","TN","FN"))

]

.right-plot[


]


False Discovery Rate (FDR)

$$FDR = \frac{FP}{FP + TP}$$


layout: false

FDR and p-value distribution

.pull-left[ - TP true positives (H0 rejected if HA true) - FP false positives (H0 rejected if H0 true) - FN false negatives (H0 accepted if HA true) - TN true negatives (H0 accepted if H0 true)

$$FDR = \frac{FP}{FP + TP}$$ ]

.right-plot[

hist(pvals24, breaks = 20, ylim = c(0,300), xlim = c(0,0.2), main = "D")
abline(h = (1000 - 400)/20, col = 3)
abline(v = 0.05, col = 2)
text(x = c(0.02, 0.02, 0.1, 0.1), y = c(20,100,20,100),
     labels = c("FP","TP","TN","FN"))

]

.footnote[John D.Storey 2002; Storey and Tibshirani 2003; Prummer 2012]


FDR - Benjamini Hochberg - procedure

Definition of FDR as given in the Benjamini and Hochberg paper 1995.

table <- data.frame( c("Reject H0","Accept H0", "Total"), matrix(c("V (FP)","S (TP)","R","U (TN)","T (FN)","m-R","m_0","m-m_0","m"), ncol=3, byrow=T))
colnames(table) <- c("R/C","H0 TRUE", "HA", "Total")
knitr::kable(table, format="html")

the proportion of false discoveries among the discoveries (rejections of the null hypothesis)

$Q=V/R=V/(V+S); ~~~ where ~~~ Q=0 ~~if~~R=0$
$FDR = Q_e = E[Q]$ (expected value of $Q$).

.footnote[https://en.wikipedia.org/wiki/False_discovery_rate, Benjamini-Hochberg (1995)]

FDR - Benjamini Hochberg - procedure

For any given FDR level $\alpha$, the Benjamini-Hochberg (1995) procedure is very practical
because it simply requires that we are able to compute p-values for each of the individual tests and this permits a procedure to be defined.


FDR - Benjamini Hochberg - procedure

.footnote[Highlighted code illustrates the Benjmini Hochberge procedure (top line) and how you would compute the FDR in R using the method p.adjust.] .left-code[

alpha <- 0.05
i = seq(along=pvals24)
k <- max(which(sort(pvals24) < i/m*alpha)) #<<
padj <- p.adjust(pvals24,method="BH") #<<

# both return same number 
stopifnot(k == sum(padj < 0.05)) #<< 

par(mfrow=c(2,2))
hist(pvals24, breaks=20)
hist(padj , breaks = 20)
plot(i,sort(pvals24))
abline(0,i/m*alpha, col=2)
plot(i[1:k],sort(pvals24)[1:k],type="b",
     main="Close-up")
abline(0,i/m*alpha, col=2)

]

.right-plot[


]


exclude: true

FDR - Benjamini Hochberg - procedure

.right-code[

p.adjust.BH <- function (p,  n = length(p))
{
  nm <- names(p)
  p <- as.numeric(p)
  p0 <- setNames(p, nm)
  if (all(nna <- !is.na(p)))
    nna <- TRUE
  p <- p[nna]
  lp <- length(p)
  stopifnot(n >= lp)
  if (n <= 1) return(p0)
  i <- lp:1L
  o <- order(p, decreasing = TRUE)
  ro <- order(o)
  p0[nna] <-  pmin(1, cummin(n/i * p[o]))[ro]
  return(p0)
}

]

.footnote[Do not use this but run p.adjust]


FDR - Conclusion

Although we will end up with more false positives, FDR gives us much more power. This makes it particularly appropriate for discovery phase experiments where we may accept FDR levels much higher than 0.05.

The FDR has an interpretation in a Multiple testing setting, while an unadjusted p-value in a hight throughput setting has NO explanation.

The BH procedure is valid when the $m$ tests are independent, and also in various scenarios of dependence, but is not universally valid. (e.g. gene sets.)

.footnote[@JG do not report p-values for OMICS data, just use a higher FDR threshold.]


layout: false

Possible p-value distributions

In practice we can observe various shapes of p-value distributions.

How to interpret a p-value histogram

This blog post discusses what types of p-value
distrubutions you might encounter when analysing
data and how to treat them.

.img-right[ ]


Conclusions


Conclusions

If you do subgroup analysis use
exploratory or descriptive data analysis:
- tabulating (e.g. Venn diagrams) - dimensionality reduction (e.g, PCA) - clustering of samples and proteins (e.g, time series clustering) - Use GSEA or ORA analysis to contrast subgroups. - Do not over-interpret your findings.


Thank you

.img-small[ ]

.footnote[https://xkcd.com/882/]



wolski/p3003PBC documentation built on Nov. 30, 2024, 7:14 a.m.