npCriterion: Feature selection based on Neyman-Pearson Criterion (NPC)

Description Usage Arguments Value Author(s) References Examples

Description

Feature selection based on Neyman-Pearson Criterion (NPC)

Usage

1
2
3
4
npCriterion(x, y, method = c("logistic", "penlog", "svm", "randomforest",
  "lda", "slda", "nb", "nnb", "ada", "tree"), enumeration = NULL,
  max_feature_size = NULL, alpha, delta = 0.05, B = 5, l0 = 0.5,
  l1 = 0.5, seed = NULL, ncores = detectCores() - 1, ...)

Arguments

x

a design matrix

y

a vector containing binary labels 0 and 1

method

base classification method

  • logistic: Logistic regression. glm function with family = 'binomial'

  • penlog: Penalized logistic regression with LASSO penalty. glmnet in glmnet package

  • svm: Support Vector Machines. svm in e1071 package

  • randomforest: Random Forest. randomForest in randomForest package

  • lda: Linear Discriminant Analysis. lda in MASS package

  • slda: Sparse Linear Discriminant Analysis with LASSO penalty.

  • nb: Naive Bayes. naiveBayes in e1071 package

  • nnb: Nonparametric Naive Bayes. naive_bayes in naivebayes package

  • ada: Ada-Boost. ada in ada package

enumeration

a feature set generation method, which can either be 'forward', 'backward' or 'exhaustive'. Default: 'forward'

max_feature_size

an optional integer when enumeration is 'exhaustive'. When not supplied, set to be the total number of features

alpha

a numeric scalar between 0 and 1 indicating the population type I error control

delta

a numeric scalar between 0 and 1 indicating the violation rate. Default: 0.05

B

a positive integer indicating the number of random splits. Default: 5

l0

a numeric scalar between 0 and 1 indicating the proportion of leave-out class 0 data points. Default: 0.5

l1

a numeric scalar between 0 and 1 indicating the proportion of leave-out class 1 data points. Default: 0.5

seed

random seed

ncores

a positive integer that specifies the number of cores for computing. Default: number of cores - 1.

...

additional argument for base classification methods.

Value

npCriterion returns a list with the following components:

method

the base classification method

alpha

user-specified alpha value

delta

user-specified delta value

B

total number of random splits

l0

the proportion of leave-out class 0 data points

l1

the proportion of leave-out class 1 data points

featuresets_examined

when 'enumeration' = 'forward', a list of size 2 whose first component is enumeration, and the second component is a vector of the features that are sequentially included; when 'enumeration' = 'backward', a list of size 2 whose first component is enumeration, and the second component is a vector of the features that are sequentially excluded; when 'enumeration' = 'exhaustive', a list of size 2 whose first component is enumeration, and the second component is a list of matrices whose column number ranges from 1 to max_feature_size. Rows of such a matrix represent a feature set.

npc

when 'enumeration' = 'forward' or 'backward', a vector of NPC values of feature sets in featuresets_examined; when 'enumeration' = 'exhaustive', a list of vectors of NPC values computed on the feature sets in featuresets_examined

npc.sd

when 'enumeration' = 'forward' or 'backward', a vector of standard deviations of empirical type II errors of feature sets in featuresets_examined; when 'enumeration' = 'exhaustive', a list of standard deviations of empirical type II errors computed on the feature sets in featuresets_examined

npc.se

when 'enumeration' = 'forward' or 'backward', a vector of standard errors of empirical type II errors of feature sets in featuresets_examined; when 'enumeration' = 'exhaustive', a list of standard errors of empirical type II errors computed on the feature sets in featuresets_examined

err

when 'enumeration' = 'forward' or 'backward', a vector of CV errors of feature sets in featuresets_examined; when 'enumeration' = 'exhaustive', a list of vectors of CV errors computed on the feature sets in featuresets_examined

err.se

when 'enumeration' = 'forward' or 'backward', a vector of standard deviations of test errors of feature sets in featuresets_examined; when 'enumeration' = 'exhaustive', a list of standard deviations of test errors computed on the feature sets in featuresets_examined

err.se

when 'enumeration' = 'forward' or 'backward', a vector of standard errors of test errors of feature sets in featuresets_examined; when 'enumeration' = 'exhaustive', a list of standard errors of test errors computed on the feature sets in featuresets_examined

features_minNPC

a feature set with the minimal NPC value and its corresponding NPC statistics and test errors.

Author(s)

Yiling Chen, yiling0210@ucla.edu

References

FILL HERE

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
### Example1 #####
x = matrix(rnorm(20000), ncol =5)
y = rbinom(x%*%1:5,size = 1, p =0.5)
table(y)
temp1 = npCriterion(x,y,method = "logistic",
enumeration = 'forward',
max_feature_size = NULL,
alpha =  0.05,
delta = 0.05,
B = 5,
l0 = 0.5,
l1 = 0.5)

temp2 = npCriterion(x,y,method ="svm",
kernel = 'radial',
enumeration = 'exhaustive',
alpha =  0.05,
delta = 0.05,
B = 5,
l0 = 0.5,
l1 = 0.5)
### Example2 #####
y = rbinom(100000,size = 1, p =0.5)
x = matrix(NA, nrow =100000,ncol =2)
x1 = cbind(rnorm(sum(y==1),mean =1, sd =1),rnorm(sum(y==1),mean =1, sd =1))
x0 = cbind(rnorm(sum(y==0),mean =-1, sd =1),rnorm(sum(y==0),mean =0.5, sd =1.5))
pnorm(qnorm(0.95,-1,1),1,1)
pnorm(qnorm(0.95,0.5,1.5),1,1)
x[y==1,] = x1
x[y==0,] = x0
table(y)

temp3 = npCriterion(x,y,method ="lda",
                    enumeration = 'exhaustive',
                    alpha =  0.05,
                    delta = 0.05,
                    B = 5,
                    l0 = 0.5,
                    l1 = 0.5)
temp3$criteria$`ell=1`

yiling0210/NPCriterion documentation built on May 10, 2019, 1:25 p.m.