knitr::opts_chunk$set(collapse=TRUE,comment="#>")
This package implements four different methods (namely, Fisher's, Lancaster's, Stouffer's and Sidak's) for aggregating p-values arising from multiple experiments or multiple hypothesis tests within one experiment. For a quick overview of the package contents and other necessary details, please refer to the README file.
This tutorial, as mentioned in the README file, summarizes the installation and usage and demonstrates the correctness and efficiency of the implemented R functions.
#devtools::install_github("rupambh/rupamb625",build_vignettes=TRUE) # Not run while building this tutorial. library(rupamb625)
The package contains four functions, one corresponding to each of the methods. To view the source R codes you can use the following.
rupamb625::fisher rupamb625::lancaster rupamb625::stouffer rupamb625::sidak
The help pages contain the references which provide the mathematical details. Help pages can be accessed by the following calls.
?fisher ?lancaster ?stouffer ?sidak
All the three functions other than lancaster() require a single numeric vector (containing the p-values to be aggregated) as input, while lancaster() requires an additional numeric vector of non-negative weights as the second input to perform a weighted aggregation. A few crucial points to note are:
rupamb625::fisher(c(.1,.2,.3)) rupamb625::stouffer(c(.1,.2,.3)) rupamb625::sidak(c(.1,.2,.3)) rupamb625::lancaster(c(.1,.2,.3),c(1,2,3)) rupamb625::lancaster(c(.1,.2,.3),rep(2,3)) rupamb625::lancaster(c(.1,.2,.3))
rupamb625::fisher(c(NA,.1,.2,.3,-0.1,1.1)) rupamb625::stouffer(c(NA,.1,.2,.3,-0.1,1.1)) rupamb625::sidak(c(NA,.1,.2,.3,-0.1,1.1)) rupamb625::lancaster(c(NA,.1,.2,.3,.4,.5,-0.1,1.1),c(1,1,2,3,-4,NA,5,6)) # Equivalent to running rupamb625::lancaster(c(.1,.2,.3),c(1,2,3))
Here, we demonstrate the correctness of our results by comparing them to functions from two other packages, namely, the aggregation and metap packages available on CRAN.
You can install these packages by running the following on R.
install.packages("aggregation","metap")
Note that both of these packages have their own implementations of the Fisher and Lancaster methods, while aggregation has the Sidak method only bu not the Stouffer method, and for metap it is the other way round.
First of all, see that aggregation::fisher() is not set-up to filter out negatives or >1 values. The same goes for metap::sumlog(), and it is also unable to handle NA inputs. Similar notes follow for all the other methods as well.
all.equal(rupamb625::fisher(c(NA,-.1,.1,.2,.3,1.1)),aggregation::fisher(c(NA,.1,.2,.3))) all.equal(rupamb625::fisher(c(NA,-.1,.1,.2,.3,1.1)),metap::sumlog(c(.1,.2,.3))$p) all.equal(rupamb625::lancaster(c(NA,-.1,.1,.2,.3,1.1),1:6),aggregation::lancaster(c(.1,.2,.3),3:5)) all.equal(rupamb625::lancaster(c(NA,-.1,.1,.2,.3,1.1),1:6),metap::invchisq(c(.1,.2,.3),3:5)$p) all.equal(rupamb625::stouffer(c(NA,-.1,.1,.2,.3,1.1)),as.numeric(metap::sumz(c(.1,.2,.3))$p)) all.equal(rupamb625::sidak(c(NA,-.1,.1,.2,.3,1.1)),aggregation::sidak(c(NA,.1,.2,.3)))
Further, aggregation::lancaster() claims to filter out NAs and negative weights, but only filters out negative values, as can be seen below.
aggregation::lancaster(1:4/10,c(1:3,NA)) aggregation::lancaster(1:4/10,c(1:3,-4)) aggregation::lancaster(1:3/10,1:3)
The reason behind this is the following two lines of code that can be accessed by calling aggregation::lancaster. An & operator should be used in place of the pipe (|).
pvalues <- pvalues[weights > 0 | is.na(weights)] weights <- weights[weights > 0 | is.na(weights)]
Our version of lancaster already solves this issue.
For each of the functions, we compare their performances with their counterparts from the other packages. We apply the functions on a sequence of inputs with increasing size, ranging from 10 to 10000. If you want to run your versions of these comparisons, install the packages required for comparisons and visualizations by running the code provided below.
install.packages("dplyr","ggplot2","ggbeeswarm","lattice","tidyr","bench")
for(i in 1:4) { n=10^i input=stats::runif(n) result=bench::mark(rupamb625::fisher(input),aggregation::fisher(input),metap::sumlog(input)$p) print(paste0("Size of input vector: ",10^i)) print(plot(result)) print(result) }
Clearly, for this method, our method performs the fastest among the three. The differences in timings, especially that with aggregation::fisher(), become very pronounced as we move along to larger and larger inputs.
for(i in 1:4) { n=10^i input1=stats::runif(n) input2=stats::runif(n,0,10) result=bench::mark(rupamb625::lancaster(input1,input2),aggregation::lancaster(input1,input2),metap::invchisq(input1,input2)$p) print(paste0("Size of input vector: ",10^i)) print(plot(result)) print(result) }
Our implementation of this method performs better than its counterpart from aggregation in all input sizes. However, the metap implementation performs better than ours in larger input sizes.
for(i in 1:4) { n=10^i input=stats::runif(n) result=bench::mark(rupamb625::stouffer(input),as.numeric(metap::sumz(input)$p)) print(paste0("Size of input vector: ",10^i)) print(plot(result)) print(result) }
Our implementation performs way better than the metap implementation in terms of timing. Though the improvements become a little less pronounced as we move along up to 10000 input size, the achieved level of performance is pretty high.
for(i in 1:4) { n=10^i input=stats::runif(n) result=bench::mark(rupamb625::sidak(input),aggregation::sidak(input)) print(paste0("Size of input vector: ",10^i)) print(plot(result)) print(result) }
Here, we only make a very tiny bit of improvement in terms of timing.
Thanks for taking your time to look at our package. In case you want to let us know about any comments, queries or suggestions, contact us here!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.