pr.boot: Bootstrap Confidence Intervals for Precision-Recall Curves

View source: R/pr.R

pr.bootR Documentation

Bootstrap Confidence Intervals for Precision-Recall Curves

Description

This functions calculates bootstrap percentile CIs for PR curves using precrec. These can then be used in a plotting function, see example.

Usage

pr.boot(
  labels,
  preds,
  boot.n = 10000,
  boot.stratified = TRUE,
  alpha = 0.1,
  ...
)

Arguments

labels

(numeric())
Vector of responses/labels (only two classes/values allowed: cases/positive class = 1 and controls/negative class = 0)

preds

(numeric())
Vector of prediction values. Higher values denote positive class.

boot.n

(numeric(1))
Number of bootstrap resamples. Default: 10000

boot.stratified

(logical(1))
Whether the bootstrap resampling is stratified (same number of cases/controls in each replicate as in the original sample) or not. It is advised to use stratified resampling when classes from labels are imbalanced. Default: TRUE.

alpha

(numeric(1))
Confidence level for bootstrap percentile interval (between 0 and 1). Default is 0.1, corresponding to 90% confidence intervals.

...

Other parameters to pass on to precrec::evalmod, except mode (set to rocpr) and raw_curves (set to TRUE). For example x_bins indicates the minimum number of recall points on the x-axis.

Value

A tibble with columns:

  • recall: recall of original data

  • precision: precision of original data

  • low_precision: low value of the bootstrap confidence interval

  • high_precision: high value of the bootstrap confidence interval

References

Saito, Takaya, Rehmsmeier, Marc (2016). “Precrec: fast and accurate precision-recall and ROC curve calculations in R.” Bioinformatics, 33(1), 145–147. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1093/bioinformatics/btw570")}.

Examples

set.seed(42)
# imbalanced labels
labels = sample(c(0,1), 100, replace = TRUE, prob = c(0.8,0.2))
# predictions
preds = rnorm(100)

# get CIs for PR curve
pr_tbl = pr.boot(labels, preds, boot.n = 100, x_bins = 30) # default x_bin is 1000
pr_tbl

# draw PR curve + add the bootstrap percentile confidence bands
library(ggplot2)

pr_tbl |>
  ggplot(aes(x = recall, y = precision)) +
  geom_step() +
  ylim(c(0,1)) +
  geom_ribbon(aes(ymin = precision_low, ymax = precision_high), alpha = 0.2)


usefun documentation built on Sept. 15, 2024, 1:06 a.m.