In mvaniterson/bacon: Controlling bias and inflation in association studies using the empirical null distribution

modified: Sat Jan 20 08:18:27 2018 compiled: r date()

suppressPackageStartupMessages(library(bacon))
BiocStyle::markdown()
options(digits=3)

Introduction

r Biocpkg("bacon") can be used to remove inflation and bias often observed in epigenome- and transcriptome-wide association studies [@vanIterson2017].

To this end r Biocpkg("bacon") constructs an empirical null distribution using a Gibbs Sampling algorithm by fitting a three-component normal mixture on z-scores. One component is forced, using prior knowledge, to represent the null distribution with mean and standard deviation representing the bias and inflation. The other two components are necessary to capture the amount of true associations present in the data, which we assume unknown but small.

r Biocpkg("bacon") provides functionality to inspect the output of the Gibbs Sampling algorithm, i.e., plots of traces, posterior distributions and the mixture fit, are provided. Furthermore, inflation- and bias-corrected test-statistics, effect-sizes and standard errors, or P-values are extracted easily. In addition, functionality for performing fixed-effect meta-analysis and obtaining inflation- and bias-corrected statistics with a 95% Confidence Interval (CI) are provided as well.

The function bacon requires a vector or a matrix of z-scores and/or effect-sizes and standard errors, e.g., those extracted from association analyses using a linear regression approach. For fixed-effect meta-analysis a matrix of effect-sizes and standard-errors is required.

This vignette illustrates the use of r Biocpkg("bacon") using simulated z-scores, effect-sizes and standard errors to avoid long run-times. If multiple sets of test-statisics or effect-sizes and standard-errors are provided, the Gibbs Sampler algorithm can be executed in parallel to reduce computation time using functionality provide by r Biocpkg("BiocParallel")-package.

A single set of test-statistics

A vector containing $5000$ z-scores is generated from a normal mixture distribution, $90\%$ of the z-scores were drawn from a biased and inflated null distribution, $\mathcal{N}(0.2, 1.3)$, and the remaining z-scores from $\mathcal{N}(\mu, 1)$, where $\mu \sim \mathcal{N}(4, 1)$. The rnormmix-function provided by Bacon generates a vector of random test-statistics described above optionally with different parameters.

y <- rnormmix(5000, c(0.9, 0.2, 1.3, 1, 4, 1))

The function bacon executes the Gibbs Sampler algorithm and stores all in- and out-put in an object of class Bacon. Several accessor-functions are available to access data contained in the Bacon-object, e.g. for obtaining the estimated parameters of the mixture fit or explicitly the bias and inflation. Actually, the latter two are the mean and standard deviation of the null component (mu.0 and sigma.0).

bc <- bacon(y)
bc
estimates(bc)
inflation(bc)
bias(bc)

Several methods are provided to inspect the output of the Gibbs Sampler algorithm, such as traces-plots of all estimates, plots of posterior distributions, provide as a scatter plot between two parameters, and the actual fit of the three component mixture to the histogram of z-scores.

traces(bc, burnin=FALSE)

posteriors(bc)

fit(bc, n=100)

The previous three plots can be use as diagnostic tools to inspect the Gibbs sampling process.

There is also a generic plot function that can generate two types of plots; a histogram of the z-scores and a qq-plot. The histogram of the z-scores shows on top the standard normal distribution and the Gibbs Sampling estimated empirical null distribution. The quantile-quantile plot shows the $-log_{10}$ transformed P-values. Default values are raw, not controlled for bias and inflation, z-scores and P-values.

plot(bc, type="hist")

```r$ transformed P-values. Left panel using uncorrected P-values and right panel using bacon bias and inflation corrected P-values."} plot(bc, type="qq")

# Multiple sets of test-statistics #

Matrices containing $5000\times6$ effect-sizes and standard errors are
generated to simulated data for a fixed-effect meta-analyses. This is
a toy-example just to illustrate the capabilities of `bacon` in
handling multiple sets of test-statics.

```r
set.seed(12345)
biases <- runif(6, -0.2, 0.2)
inflations <- runif(6, 1, 1.3)
es <- matrix(nrow=5000, ncol=6)
for(i in 1:6)
    es[,i] <- rnormmix(5000, c(0.9, biases[i], inflations[i], 0, 4, 1), shuffle=FALSE)
se <- replicate(6, 0.8*sqrt(4/rchisq(5000,df=4)))
colnames(es) <- colnames(se) <- LETTERS[1:ncol(se)]
rownames(es) <- rownames(se) <- 1:5000
head(rownames(es))
head(colnames(es))

By default the function bacon detects the number of cores/nodes registered, as described in the r Biocpkg("BiocParallel"), to perform bacon in parallel. To run the vignette in general we set it here for convenience to 1 node.

library(BiocParallel)
register(MulticoreParam(1, log=TRUE))
bc <- bacon(NULL, es, se)
bc
knitr::kable(estimates(bc))
inflation(bc)
bias(bc)
knitr::kable(tstat(bc)[1:5,])
knitr::kable(pval(bc)[1:5,])
knitr::kable(se(bc)[1:5,])
knitr::kable(es(bc)[1:5,])

The accessor-function return as expected matrices of estimates. For the plotting functions an additional index of the ith study or z-score is required.

traces(bc, burnin=FALSE, index=3)

posteriors(bc, index=3)

fit(bc, n=100, index=3)

plot(bc, type="hist")

```r$ transformed P-values. Left panel using uncorrected P-values and right panel using bacon bias and inflation corrected P-values."} plot(bc, type="qq")

# Fixed-effect meta-analysis #

The following code chunk shows how to perform fixed-effect
meta-analysis and the inspection of results.

```r
bcm <- meta(bc)
head(pval(bcm))
print(topTable(bcm))

```r$ transformed P-values for each cohort and the meta-analysis P-values. Left panel using uncorrected P-values and right panel using bacon bias and inflation corrected P-values."} plot(bcm, type="qq")

# Adjustment with 95% CI #

Here we describe how inflation- and bias-corrected statistics with a 95% Confidence
Interval (CI) can be constructed. 

Given a vector of z-scores. Replicate the z-scores, say 100 times, and store in a matrix. 
```r
zs <- rnormmix(5000, c(0.9, 0.2, 1.3, 1, 4, 1))

This is a toy-example just to illustrate how bacon can be used to create a sampling distributions of metrics output from bacon therefore we just use 10 replicates to speedup the calculations.

zs <- cbind(zs, matrix(zs, nrow=5000, ncol=9))
colnames(zs) <- c(paste0("A", 1:10))
rownames(zs) <- 1:5000
head(rownames(zs))
head(colnames(zs))

By default the function bacon sets a global seed for random number generation (RNG) that occurs in the Gibbs Sampling process. The global seed is sufficient for controlling the RNG when the input is a single-vector or a matrix so that each call to bacon produces the same results. Since r Biocpkg("BiocParallel") performs RNG independently of the global environment, the globalSeed can be set to NULL to allow RNG across each parallel process that calls bacon, and the parallelSeed (default 42) used by r Biocpkg("BiocParallel") will control the RNG so that a separate call to bacon with the same input matrix will produce the same 'randomized' results.

library(BiocParallel)
register(MulticoreParam(1, log=TRUE))
bc <- bacon(teststatistics = zs, 
            globalSeed = NULL,
            parallelSeed = 42)
bc
head(inflation(bc))
head(bias(bc))

The resulting sampling distributions can be used to calculate the average estimated inflation and bias metrics with a 95% CI. Averaged inflation- and bias- adjusted z-scores and p-values can also be easily extracted.

For example, overall average bias and inflation:

mean_bias <- mean(bias(bc))
mean_bias
mean_inflation <- mean(inflation(bc))
mean_inflation

The 95% confidence intervals:

CI_bias <- quantile(bias(bc), c(0.025, 0.975))
CI_bias
CI_inflation <- quantile(inflation(bc), c(0.025, 0.975))
CI_inflation

And average inflation- and bias-corrected z-scores and P-values:

avg_corrected_zs <- rowMeans(tstat(bc))
avg_corrected_zs[1:5]
avg_corrected_pvals <- rowMeans(bacon::pval(bc))
avg_corrected_pvals[1:5]

Session Info

Here is the output of sessionInfo() on the system on which this document was compiled:

sessionInfo()

References

mvaniterson/bacon documentation built on April 19, 2024, 5:32 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

mvaniterson/bacon
Controlling bias and inflation in association studies using the empirical null distribution

In mvaniterson/bacon: Controlling bias and inflation in association studies using the empirical null distribution

Introduction

A single set of test-statistics

Session Info

References

R Package Documentation

Browse R Packages

We want your feedback!

mvaniterson/bacon Controlling bias and inflation in association studies using the empirical null distribution

In mvaniterson/bacon: Controlling bias and inflation in association studies using the empirical null distribution

Introduction

A single set of test-statistics

Session Info

References

R Package Documentation

Browse R Packages

We want your feedback!

mvaniterson/bacon
Controlling bias and inflation in association studies using the empirical null distribution