aggregatePvalues: Post Selection with Aggregate Testing for Independent...
In ammeir2/PSAT: Inference after Selection with Aggregate Testing

Description Usage Arguments Details Value References Examples

Performs selection adjustment for independent p-values following selection with an aggregate level test statistic.

1 2	aggregatePvalues(pmat, globaltest = c("Fisher", "Pearson"), pval_threshold = NULL)

`pmat`	An `MXn` matrix of original p-values , where `M` is the number of features and `n` is the number of independent p-values for each feature. If `globaltest="Pearson"`, the original p-values must be one-sided p-values. No default.
`globaltest`	"Fisher" or "Pearson". The default is "Fisher". Set it to "Pearson" only for two-sided alternatives, if common directionality is expected for each feature.
`pval_threshold`	The p-value selection threshold. If `pval_threshold=NULL`, only the global null p-value is computed. Must be entered for computation of the conditional p-values. Either a scalar, or a vector of length M with positive entries. See Details.

The Fisher test statisic is minus two times the sum of the log of the input p-values, and the Fisher global null p-value is the probability that a chi-squared distribution with 2n degrees of freedom exceeds the Fisher test statistc, where n is the number of input p-values.

The Pearson test statistic is the maximum of the Fisher test statistic based on left-sided p-values and the Fisher test statistic based on right-sided p-values, and the Pearson global null p-value is twice the probability that a chi-squared distribution with 2n degrees of freedom exceeds the Pearson test statistic. The global null p-value is output in pF. If the hypotheses are one-sided, set globaltest= "Fisher"; if the hypotheses are two-sided, set globaltest="Pearson". It is assumed that the p-values come from continuous test statistics, so that left sided pvalue + right sided pvalue = 1. See Heller et al. (2016) for details about these global tests, about the post-selection inference using these tests, and about extensions to other global tests.

The p-value threshold, pval_threshold, is the threshold that the global null p-value has to reach in order to be selected for post-selection inference. For example, for a bonferroni correction at level alpha for a family of m global null hypotheses, it should be set to alpha/m. The conditional p-values are computed only if pF<=pval_threshold.

List containing the global null p-value in pF, and the conditional p-values in p2C. If a global null p-value is above the threshold, then the conditional p-value vector corresponding to it is a vector of NA's; if pval_threshold=NULL, then the conditional p-value matrix P2C is NA's.

Ruth Heller, Nilanjan Chatterjee, Abba Krieger, Jianxin Shi (2016). Post-selection Inference Following Aggregate Level Hypothesis Testing in Large Scale Genomic Data. bioRxiv: http://dx.doi.org/10.1101/058404

set.seed(123)
beta  = c(rnorm(10,0,2.5), rep(0, 32))
pmat1sided <- matrix(1-pnorm(rnorm(40,beta)), nrow=2, ncol=20, byrow=TRUE) #one-sided p-values
pmat2sided = 2*pmin(pmat1sided, 1-pmat1sided) #two-sided p-values

#compute the one-sided Fisher global null p-value and the conditional p-values given that Fisher's
# global null p-value is at most 0.001
out.Fisher= aggregatePvalues(pmat1sided, globaltest = "Fisher", pval_threshold=0.001)
print(out.Fisher)

#plot the original p-values, and the conditional p-values after global null selection by Fisher,
# for the selected row.
par(mfrow=c(1,1))
plot(seq(1:20), -log10(pmat1sided[1,]), col="red", ylab = "-log10(PV)",
     main="conditional and original one-sided p-values on -log10 scale" )
points(seq(1:20), -log10(out.Fisher$p2C[1,]), pch = 2)
legend(16,8, c("original",  "cond Fisher"), pch = c(1,2), col=c("red", "black"))
abline(-log10(0.05/20),0,lty=2, col="gray")


#compute the two-sided Fisher global null p-value and the conditional p-values given that Fisher's
# global null p-value is at most 0.001
out.Fisher= aggregatePvalues(pmat2sided, globaltest = "Fisher", pval_threshold=0.001)
print(out.Fisher)

#plot the original p-values, and the conditional p-values after global null selection by Fisher,
# for the selected row.
par(mfrow=c(1,1))
plot(seq(1:20), -log10(pmat2sided[1,]), col="red", ylab = "-log10(PV)",
     main="conditional and original two-sided p-values on -log10 scale" )
points(seq(1:20), -log10(out.Fisher$p2C[1,]), pch = 2)
legend(16,8, c("original",  "cond Fisher"), pch = c(1,2), col=c("red", "black"))
abline(-log10(0.05/20),0,lty=2, col="gray")



#compute the Pearson global null p-value and the conditional p-values given that Pearson's
# global null p-value is at most 0.001
out.Pearson = aggregatePvalues(pmat1sided, globaltest = "Pearson", pval_threshold=0.001)
print(out.Pearson)

#plot the original p-values, and the conditional p-values after global null selection by Pearson,
# for the selected row.
par(mfrow=c(1,1))
plot(seq(1:20), -log10(pmat2sided[1,]), col="red", ylab = "-log10(PV)",
     main="conditional and original two-sided p-values on -log10 scale" )
points(seq(1:20), -log10(out.Pearson$p2C[1,]), pch = 2)
legend(16,8, c("original",  "cond Pearson"), pch = c(1,2), col=c("red", "black"))
abline(-log10(0.05/20),0,lty=2, col="gray")