In pedro-teles-fonseca/pcal: Calibration of P-Values for Point Null Hypotheses

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%")

library(pcal)

pcal: Calibration of P-Values for Point Null Hypotheses

Overview

P-values are the most commonly used tool to measure the evidence provided by the data against a model or hypothesis. Unfortunately, p-values are often incorrectly interpreted as the probability that the null hypothesis is true or as type I error probabilities. The pcal package uses the calibrations developed by @sellke2001 to calibrate p-values under a robust perspective and obtain measures of the evidence provided by the data in favor of point null hypotheses which are safer and more straightforward interpret:

pcal() calibrates p-values so that they can be directly interpreted as either lower bounds on the posterior probabilities of point null hypotheses or as lower bounds on type I error probabilities. With this calibration one need not fear the misinterpretation of a type I error probability as the probability that the null hypothesis is true because they coincide. Note that the output of this calibration has both Bayesian and Frequentist interpretations.
bcal() calibrates p-values so that they can be interpreted as lower bounds on Bayes factors in favor of point null hypotheses.

Some utility functions are also included:

bfactor_to_prob() turns Bayes factors into posterior probabilities using a formula from @bergerDelampady1987.
bfactor_interpret() classifies the strength of the evidence implied by a Bayes factor according to the scales suggested by @jeffreys1961 and @kass1995.
bfactor_log_interpret() is similar to bfactor_interpret() but takes logarithms of Bayes factors as input.

Installation

The released version of pcal can be installed from CRAN with:

install.packages("pcal")

The development version can be installed from GitHub using the devtools package:

# install.packages("devtools")
devtools::install_github("pedro-teles-fonseca/pcal")

Usage

First we need a p-value from any statistical test of a point null hypothesis:

x <- matrix(c(22, 13, 13, 23), ncol = 2)
pv <- chisq.test(x)[["p.value"]]

pv

In classical hypothesis testing, if the typical 0.05 significance threshold is used then this p-value slightly below 0.05 would result in the rejection of the null hypothesis.

With bcal() we can turn pv into a lower bound for the Bayes factor in favor of the null hypothesis:

bcal(pv)

We can also turn pv into a lower bound for the posterior probability of the null hypothesis using pcal():

pcal(pv)

This is an approximation to the minimum posterior probability of the null hypothesis that we would find by changing the prior distribution of the parameter of interest (under the alternative hypothesis) over wide classes of distributions. The output of bcal() has an analogous interpretation for Bayes factors (instead of posterior probabilities).

Note that according to pcal() the posterior probability that the null hypothesis is true is at least 0.27 (approximately), which implies that a p-value below 0.05 is not necessarily indicative of strong evidence against the null hypothesis.

One can avoid the specification of prior probabilities for the hypotheses by focusing solely on Bayes factors. To compute posterior probabilities for the hypotheses, however, prior probabilities must by specified. By default, pcal() assigns a prior probability of 0.5 to the null hypothesis. We can specify different prior probabilities, for example:

pcal(pv, prior_prob = .95)

In this case we obtain a higher lower bound because the null hypothesis has a higher prior probability.

@sellke2001 noted that a scenario in which they definitely recommend the aforementioned calibrations is when investigating fit to the null model with no explicit alternative in mind. @pericchiTorres2011 warned that despite the usefulness and appropriateness of these p-value calibrations they do not depend on sample size, and hence the lower bounds obtained with large samples may be conservative.

Since the output of bcal(pv) is a Bayes factor, we can use bfactor_to_prob() to turn it into a posterior probability:

bfactor_to_prob(bcal(pv)) # same as pcal(pv)

Like pcal(), bfactor_to_prob() assumes a prior probability of 0.5 to the null hypothesis. We can change this default:

bfactor_to_prob(bcal(pv), prior_prob = .95)

To classify the strength of the evidence in favor of the null hypothesis implied by a Bayes factor according to the scale suggested by @jeffreys1961 we can use bfactor_interpret():

bfs <- c(0.1, 2, 5, 20, 50, 150)

bfactor_interpret(bfs)

Alternatively, we can use the interpretation scale suggested by @kass1995:

bfactor_interpret(bfs, scale = "kass-raftery")

Because Bayes factors are often reported on a logarithmic scale, there is also a bfactor_log_interpret() function that interprets the logarithms of Bayes factors:

log_bfs <- log10(bfs)

bfactor_log_interpret(log_bfs, base = 10)

bfactor_log_interpret(log_bfs, scale = "kass-raftery", base = 10)

To compare Bayes factors with results from standard likelihood ratio tests it can be useful to obtain the strength of the evidence against the null hypothesis. If bf is a Bayes factor in favor of the null hypothesis, one can use 1/bf as input to obtain the strength of the evidence against the null hypothesis:

# Evidence in favor of the null hypothesis
bfactor_interpret(c(0.1, 2, 5, 20, 50, 150))

# Evidence against the null hypothesis
bfactor_interpret(1/c(0.1, 2, 5, 20, 50, 150))

Getting Help

If you find a bug, please file an issue with a minimal reproducible example on GitHub. Feature requests are also welcome. You can contact me at pedro.teles.fonseca@outlook.com.