knitr::opts_chunk$set(echo = TRUE)
This small package contains a few simple functions that enhance the cvAUC package created by Erin LeDell. Notably, it allows for confidence intervals and hypothesis tests about differences between cross-validated AUC for two different prediction algorithms.
cvAUC.plusThe package can be installed directly from GitHub.
devtools::install_github("benkeser/cvAUC.plus") library(cvAUC.plus)
The first demonstration shows how to define a learner that can be used with wrap_cvAUC, which will compute the cross-validated AUC for that learner. The learner that is passed to wrap_AUC is a function of a specific form. If the user is familiar with the format of wrappers passed to the SuperLearner function in the package of the same name, then defining these wrappers should be fast and easy. A proper wrapper for wrap_cvAUC should be a function that takes as input Y, X, and newX. The function estimates a prediction function using Y and X and returns predictions on newX. The output of the function should be a list with two entries: fit the prediction model fit (can be NULL if you'd like) and pred the predictions on newX. Here are two simple examples.
# a simple main terms GLM myglm <- function(Y, X, newX){ fm <- glm(Y~., data = X, family = binomial()) pred <- predict(fm, newdata = newX, type = "response") return(list(fit = fm, pred = pred)) } # a random forest library(randomForest) myrf <- function(Y, X, newX){ require(randomForest) fm <- randomForest(x = X, y = factor(Y), xtest = newX) pred <- fm$test$votes[,2] return(list(fit=fm, pred = pred)) }
Now we can use wrap_cvAUC to estimate the cross-validated AUC for these algorithms.
# simulate data n <- 500 X <- data.frame(x1 = rnorm(n), x2 = rnorm(n), x3 = rbinom(n,1,0.5)) Y <- with(X, rbinom(n, 1, plogis(x1*x3 + x2^2/x1 + x3))) # get CV-AUC of main terms GLM auc_glm <- wrap_cvAUC(Y = Y, X = X, learner = "myglm", seed = 123) auc_glm # get CV-AUC of main terms auc_rf <- wrap_cvAUC(Y = Y, X = X, learner = "myrf", seed = 123) auc_rf
The main addition over the cvAUC package is the ability to test for differences in cross-validated AUC between two different model fits. This is achieved via the diff_cvAUC function.
# compare random forest to GLM diff_auc <- diff_cvAUC(fit1 = auc_rf, fit2 = auc_glm) diff_auc
The wrap_cvAUC provides some functionality to control the sample splitting, based on the code developed in the SuperLearner package by Eric Polley. You can check the documentation for SuperLearner.CV.control to see all of this functionality, as internally this is the function that is called by wrap_AUC.
The wrap_cvAUC function allows for parallelization via foreach if parallel = TRUE. This will parallelize the fitting of the learner algorithm over folds. The function will internally start (and stop) a cluster using detectCores() to determine how many cores to use.
# simulate data n <- 2000 X <- data.frame(x1 = rnorm(n), x2 = rnorm(n), x3 = rbinom(n,1,0.5)) Y <- with(X, rbinom(n, 1, plogis(x1*x3 + x2^2/x1 + x3))) # non-parallel system.time(tmp <- wrap_cvAUC(Y = Y, X = X, learner = "myrf", seed = 123)) # parallel system.time(tmp <- wrap_cvAUC(Y = Y, X = X, learner = "myrf", seed = 123, parallel = TRUE))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.