knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "README-" )
The AUC statistic (Area Under the Receiver Operating Characteristic Curve, or A') is commonly used to evaluate predictive models. However, the AUC statistical also has unique, and uniquely useful, statistical properties that can make statistical significance testing especially simple. This means that instead of just choosing the best model based on average observed performance, or making educated guesses about difference in model performance, machine learning researchers can make informed, statistically rigorous comparisons about the performance of two models across several datasets and even with multiple observations within each dataset (as often occurs, for example, when using k-fold cross-validation or when models are trained using data from multiple time points).
You can install auctestr from github with:
# install.packages("devtools") devtools::install_github("jpgard/auctestr")
Soon, when auctestr is available on CRAN, you'll be able to install it with library()
. This README will be updated when that happens!
This is a basic example which shows you how to solve a common problem -- comparing ModelA and ModelB:
## basic example code data("sample_experiment_data") head(sample_experiment_data) z_score <- auc_compare(sample_experiment_data, compare_values = c("ModelA", "ModelB"), filter_value = c("VariantA"), time_col = "time", outcome_col = "auc", compare_col = "model_id", over_col = "dataset", filter_col = "model_variant") pnorm(-abs(z_score))
Based on the p-value for this comparison, the difference between these models is not statistically significant at any reasonable significance threshold $\alpha$. Good to know!
You can also compare variants of a specific model (for example, compare two different cost settings for an SVM model):
z_score <- auc_compare(sample_experiment_data, compare_values = c("VariantA", "VariantB"), filter_value = c("ModelC"), time_col = "time", outcome_col = "auc", compare_col = "model_variant", over_col = "dataset", filter_col = "model_id") pnorm(-abs(z_score))
The difference between these model variants is significant, but only marginally at $\alpha = 0.05$. Again, this is useful to know. Perhaps collecting more data would be a good next step here.
Hopefully, auctestr
makes comparing your models simple and useful. Future versions of the package will incorporate procedures for multiple-testing corrections, and Bayesian model evaluation techniques.
If you have predictive models but need a tool for calculating their AUC scores, check out the following packages:
If you're interested in reading more about the methods used in this package, check the following references, which served as the basis for this implementation:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.