p_toolkit is a package designed to help adjust and visualize p-values when using multiple comparisons. As computing power has become powerful enough to run hundreds or even thousands of statistical tests, it is important to look at small p-values and try to understand whether the result is small simply by chance, or whether it truly is significant. There are many tools to help decide when to reject a Null hypothesis, which can control either:
We can use the p-values alone, or an adjustment method such as the Bonferroni or the Benjamini-Hochberg (BH) methods. We can also use visualization methods such as QQ-plots or a scatter plot of the p-values, to try and detect patterns.
This package aims to combine these methods in a simple-to-use format, which works by outputting dataframes, which contain results from several adjustment methods.
p_adjust(data, pv_index, method, alpah =0.05): Input a vector or a dataframe with the p-values indexed, and it will output a dataframe containgn the data plus adjusted p-values via the BH method, or the Bonferroni
p_methods(data, pv_index, alpha =0.5): Input a vector or a dataframe with the p-values indexed, and it will output a dataframe containing the original data, plus critical values and whether the result is significant or not under both the BH and Bonferroni method.
p_qq(data, pv_index): Input a vector or dataframe and column index, and it will output a qqplot of the p-values compared with the uniform distribution.
p_plot(data, pv_index, alpha =0.5): Input a vector or dataframe and column index,it will output a ggplot with the p-values and both cut-off lines.
This package requires dplyr and ggplot2
Some packages already exist for the p-value adjustment in both environments, R and Python:
R:
The p.adjust
function comes in the base stats
library in R. It's a function designed for adjusting an array of p-values using six methods, some for controlling the family-wise error ("holm", "Hochberg", "Hommel", "Bonferroni") and the others for controlling the false discovery rate ("BH", "BY","fdr"). The advantage of this function is its simplicity and that it comes in the stats
library, which is built in in the default environments in R, so the user doesn't need to install external packages. It doesn't let the user analyze deeper what is going on with the tests; this is a key element of p_toolkit
.
fdrtool
is a package designed for analyzing the False Discovery Rate in statistical tests and not limited exclusively to p-value adjustment. Has some functions related to p_toolkit
like fdrtool
, which calculates and plots the false discovery rate and pval.estimate.eta0
, which outputs the proportion of null p-values in a list.
Python:
This function is part of the statsmodels
library, a complete set of functions for implementing statistical methods in Python. It works similar to R's p.adjust
, receiving an array of p-values as inputs and returning two arrays: one with the corrected p-values and another one with boolean values corresponding to the new logical values after correction. It has no diagnostics and analysis of the results.
Interested in contributing? See our Contributing Guidelines and Code of Conduct.
Created by
Amy Goldlist · Esteban Angel · Veronique Mulholland
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.