In jpgard/auctestr: Statistical Testing for AUC Data

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-"
)

auctestr logo

auctestr

The AUC statistic (Area Under the Receiver Operating Characteristic Curve, or A') is commonly used to evaluate predictive models. However, the AUC statistical also has unique, and uniquely useful, statistical properties that can make statistical significance testing especially simple. This means that instead of just choosing the best model based on average observed performance, or making educated guesses about difference in model performance, machine learning researchers can make informed, statistically rigorous comparisons about the performance of two models across several datasets and even with multiple observations within each dataset (as often occurs, for example, when using k-fold cross-validation or when models are trained using data from multiple time points).

Installation

You can install auctestr from github with:

# install.packages("devtools")
devtools::install_github("jpgard/auctestr")

Soon, when auctestr is available on CRAN, you'll be able to install it with library(). This README will be updated when that happens!

Example

This is a basic example which shows you how to solve a common problem -- comparing ModelA and ModelB:

## basic example code
data("sample_experiment_data")
head(sample_experiment_data)
z_score <- auc_compare(sample_experiment_data, compare_values = c("ModelA", "ModelB"), filter_value = c("VariantA"), time_col = "time", outcome_col = "auc", compare_col = "model_id", over_col = "dataset", filter_col = "model_variant")
pnorm(-abs(z_score))

Based on the p-value for this comparison, the difference between these models is not statistically significant at any reasonable significance threshold $\alpha$. Good to know!

You can also compare variants of a specific model (for example, compare two different cost settings for an SVM model):

z_score <- auc_compare(sample_experiment_data, compare_values = c("VariantA", "VariantB"), filter_value = c("ModelC"), time_col = "time", outcome_col = "auc", compare_col = "model_variant", over_col = "dataset", filter_col = "model_id")
pnorm(-abs(z_score))

The difference between these model variants is significant, but only marginally at $\alpha = 0.05$. Again, this is useful to know. Perhaps collecting more data would be a good next step here.

Hopefully, auctestr makes comparing your models simple and useful. Future versions of the package will incorporate procedures for multiple-testing corrections, and Bayesian model evaluation techniques.

Generating AUC Data

If you have predictive models but need a tool for calculating their AUC scores, check out the following packages:

pROC link
caret link

References and Further Reading

If you're interested in reading more about the methods used in this package, check the following references, which served as the basis for this implementation:

Hanley and McNeil, "The meaning and use of the area under a receiver operating characteristic (ROC) curve", Radiology, 1982.
Fogarty, Baker and Hudson, "Case Studies in the use of ROC Curve Analysis for Sensor-Based Estimates in Human Computer Interaction" 2008.
Stouffer, S.A.; Suchman, E.A.; DeVinney, L.C.; Star, S.A.; Williams, R.M. Jr. (1949). The American Soldier, Vol.1: Adjustment during Army Life.)

jpgard/auctestr documentation built on May 29, 2019, 1:08 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com