wrap_cvAUC: wrap_cvAUC

Description Usage Arguments Value Examples

View source: R/mainFunctions.R

Description

This function is a helper wrapper function for the cvAUC function included in the cvAUC package by Erin LeDell. The function allows the user to use the data splitting options available in the SuperLearner package and provides a specific structure for different learners to be used to generate predictions. The biggest addition with this function is that influence functions are returned, which can be used to develop hypothesis tests comparing the CV-AUC between two different learners. The function diff_cvAUC performs these tests.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
wrap_cvAUC(
  Y,
  X,
  learner,
  confidence = 0.95,
  seed = 1234,
  id = NULL,
  cvControl = list(V = 10L, stratifyCV = FALSE, shuffle = TRUE, validRows = NULL),
  returnFits = FALSE,
  parallel = FALSE,
  ...
)

Arguments

Y

A numeric vector of class labels

X

A data.frame of variables that learner will use to predict. It is assumed that the format at codeX will place nicely with the function specified by learner.

learner

A character name of a function that generates predictions. The function should take as input Y, X, and newX, use X to predict Y and return predictions for newX. See examples below.

confidence

A numeric between 0 and 1 specifying the nominal coverage probability for the confidence interval. Default is 0.95.

seed

A numeric specifying what seed to set prior to data splitting. If diff_cvAUC is to be used afterwards to compare CV-AUCs for different fits, be sure to specify the same seed so that the sample splits are the same.

id

A numeric vector of observation identifiers. Only used for splitting data and should probably be ignored for now as the CV-AUC calculations do not account for dependent data in any other way.

cvControl

A list of a specific form. See ?SuperLearner.CV.control for more information.

returnFits

A boolean indicating whether or not to return the model fit objects for each fold.

parallel

A boolean indicating whether to perform the model fitting across folds in parallel. If TRUE then foreach is used to parallelize the fitting.

...

Not currently used

Value

An object of class wrap_cvAUC with the following entries:

cvAUC

The estimated cross-validated AUC.

se

The standard error for the estimated CV-AUC.

ci

A 100*confidence percent confidence interval.

confidence

The level of confidence for the interval.

ic

The estimated influence function evaluated on the observations.

folds

The row indices for each validation sample.

fitLibrary

The fit objects from learner.

learner

The learner that was used to generate predictions.

p

The one-sided p-value testing the null hypothesis that CV-AUC = 0.5 against the alternative that CV-AUC > 0.5.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
n <- 1000
X <- data.frame(x1=rnorm(n),x2=rnorm(n))
Y <- rbinom(n,1,plogis(X$x1 + X$x2))
myglm1 <- function(Y,X,newX){
   fm <- glm(Y~.,data=X,family=binomial())
   pred <- predict(fm,newdata=newX,type="response")
   return(list(fit = fm, pred = pred))
}
myglm2 <- function(Y,X,newX){
  fm <- glm(Y~x1,data=X,family=binomial())
  pred <- predict(fm,newdata=newX,type="response")
  return(list(fit = fm, pred = pred))
}
out1 <- wrap_cvAUC(Y = Y, X=X, learner = "myglm1")
out2 <- wrap_cvAUC(Y = Y, X=X, learner = "myglm2")

benkeser/cvAUC.plus documentation built on Feb. 1, 2021, 8:42 a.m.