Calculate the Neyman-Pearson Receiver Operating Characteristics

Description

nproc calculate the Neyman-Pearson Receiver Operating Characteristics curve for a given sequence of type I error values.

Usage

1
2
3
4
nproc(x = NULL, y, method = c("logistic", "penlog", "svm", "randomforest",
  "lda", "nb", "ada", "custom"), kernel = "radial", score = NULL,
  pred.score = NULL, band = FALSE, typeI.lower = FALSE, delta = 0.05,
  split = 1, split.ratio = 0.5, n.cores = 1, randSeed = 0)

Arguments

x

n * p observation matrix. n observations, p covariates.

y

n 0/1 observatons.

method

classification method(s).

  • logistic: Logistic regression. glm function with family = 'binomial'

  • penlog: Penalized logistic regression with LASSO penalty. glmnet in glmnet package

  • svm: Support Vector Machines. svm in e1071 package

  • randomforest: Random Forest. randomForest in randomForest package

  • Linear Discriminant Analysis. lda: lda in MASS package

  • nb: Naive Bayes. naiveBayes in e1071 package

  • ada: Ada-Boost. ada in ada package

  • custom: a custom classifier. score vector needed.

kernel

kernel used in the svm method. Default = 'radial'.

score

score vector corresponding to y. Required when method = 'custom'.

pred.score

score vector corresponding to the test y. Required when method = 'custom'.

band

whether to generate two np roc curves representing a confidence band. Default = FALSE.

typeI.lower

whether to generate the data-driven type-I error lower bound. Default = FALSE.

delta

the violation rate of the type I error. Default = 0.05.

split

the number of splits for the class 0 sample. Default = 1. For ensemble version, choose split > 1. When method = 'custom', split = 0 always.

split.ratio

the ratio of splits used for the class 0 sample to train the classifier. Default = 0.5.

n.cores

number of cores used for parallel computing. Default = 1.

randSeed

the random seed used in the algorithm.

Value

An object with S3 class nproc.

typeI.u

sequence of upper bound of type I error.

typeII.l

sequence of lower bound of type I error.

typeII.u

sequence of upper bound of type II error.

auc.l

the auc value of the lower NP-ROC curve.

auc.u

the auc value of the upper NP-ROC curve.

band

whether the upper NP-ROC curve is generated.

method

the classification method implemented.

delta

the violation rate.

References

Xin Tong, Yang Feng, and Jingyi Jessica Li (2016), Neyman-Pearson (NP) classification algorithms and NP receiver operating characteristic (NP-ROC) curves, manuscript, http://arxiv.org/abs/1608.03109

See Also

npc

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
n = 200
x = matrix(rnorm(n*2),n,2)
c = 1 - 3*x[,1]
y = rbinom(n,1,1/(1+exp(-c)))
#fit = nproc(x, y, method = 'svm')
fit2 = nproc(x, y, method = 'penlog')

##Plot the nproc curve
plot(fit2)
#fit3 = nproc(x, y, method = 'penlog')

##Plot the nproc curve
#plot(fit3)

#fit3 = nproc(x, y, method = 'penlog',  n.cores = 2)
#In practice, replace 2 by the number of cores available 'detectCores()'
#fit4 = nproc(x, y, method = 'penlog', n.cores = detectCores())

#Testing the custom method for nproc.
#fit = npc(x, y, method = 'lda', split = 0,  n.cores = 2) #use npc to get score list.
#obj = nproc(x = NULL, y = fit$y, method = 'custom', split = 0,
#score = fit$score,  n.cores = 2)

#Confidence nproc curves
#fit6 = nproc(x, y, method = 'lda', band = TRUE)

#nproc ensembled version
#fit7 = nproc(x, y, method = 'lda', split = 11)