nproc: Calculate the Neyman-Pearson Receiver Operating Characteristics

Description

nproc calculate the Neyman-Pearson Receiver Operating Characteristics curve for a given sequence of type I error values.

Usage

1
2
3
4
nproc(x = NULL, y, method = c("logistic", "penlog", "svm", "randomforest",
  "lda", "nb", "ada", "custom"), kernel = "radial", score = NULL,
  band = FALSE, typeI.lower = FALSE, delta = 0.05, split = 1,
  split.ratio = 0.5, n.cores = 1, randSeed = 0)

Arguments

x

n * p observation matrix. n observations, p covariates.

y

n 0/1 observatons.

method

classification method(s).

  • logistic: Logistic regression. glm function with family = 'binomial'

  • penlog: Penalized logistic regression with LASSO penalty. glmnet in glmnet package

  • svm: Support Vector Machines. svm in e1071 package

  • randomforest: Random Forest. randomForest in randomForest package

  • Linear Discriminant Analysis. lda: lda in MASS package

  • nb: Naive Bayes. naiveBayes in e1071 package

  • ada: Ada-Boost. ada in ada package

  • custom: a custom classifier. score vector needed.

kernel

kernel used in the svm method. Default = 'radial'.

score

score vector corresponding to y. Required when method = 'custom'.

band

whether to generate two NP-ROC curves representing a confidence band. Default = FALSE.

typeI.lower

whether to generate the data-driven type-I error lower bound. NOTE: experimental feature. Default = FALSE.

delta

the violation rate of the type I error. Default = 0.05.

split

the number of splits for the class 0 sample. Default = 1. For ensemble version, choose split > 1. When method = 'custom', split = 0 always.

split.ratio

the ratio of splits used for the class 0 sample to train the classifier. Default = 0.5.

n.cores

number of cores used for parallel computing. Default = 1.

randSeed

the random seed used in the algorithm.

Value

An object with S3 class nproc.

typeI.u

sequence of upper bound of type I error.

typeII.l

sequence of lower bound of type I error.

typeII.u

sequence of upper bound of type II error.

auc.l

the auc value of the lower NP-ROC curve.

auc.u

the auc value of the upper NP-ROC curve.

band

whether the upper NP-ROC curve is generated.

method

the classification method implemented.

delta

the violation rate.

References

Xin Tong, Yang Feng, and Jingyi Jessica Li (2016), Neyman-Pearson (NP) classification algorithms and NP receiver operating characteristic (NP-ROC), manuscript, http://arxiv.org/abs/1608.03109

See Also

npc

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
n = 200
x = matrix(rnorm(n*2),n,2)
c = 1 - 3*x[,1]
y = rbinom(n,1,1/(1+exp(-c)))
#fit = nproc(x, y, method = 'svm')
fit2 = nproc(x, y, method = 'penlog')
##Plot the nproc curve
plot(fit2)

##custom method
fit.npc = npc(x, y, method = 'svm')
fit.score = predict(fit.npc,x)$pred.score
fit.custom = nproc(y = y, score = fit.score, method = 'custom')

#fit3 = nproc(x, y, method = 'penlog')

##Plot the nproc curve
#plot(fit3)

#fit3 = nproc(x, y, method = 'penlog',  n.cores = 2)
#In practice, replace 2 by the number of cores available 'detectCores()'
#fit4 = nproc(x, y, method = 'penlog', n.cores = detectCores())

#Testing the custom method for nproc.
#fit = npc(x, y, method = 'lda', split = 0,  n.cores = 2) #use npc to get score list.
#obj = nproc(x = NULL, y = fit$y, method = 'custom', split = 0,
#score = fit$score,  n.cores = 2)

#Confidence nproc curves
#fit6 = nproc(x, y, method = 'lda', band = TRUE)

#nproc ensembled version
#fit7 = nproc(x, y, method = 'lda', split = 11)

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.