npMarginal: Marginal feature screening under the Neyman-Pearson paradigm

Description Usage Arguments Value Author(s) References Examples

Description

Marginal feature screening under the Neyman-Pearson paradigm

Usage

1
2
3
4
5
npMarginal(x, y, method = c("logistic", "penlog", "svm", "randomforest",
  "lda", "slda", "nb", "nnb", "ada", "tree"),
  p.adjust.methods = c("holm", "hochberg", "hommel", "bonferroni", "BH",
  "BY", "fdr", "none"), N, alpha, delta = 0.05, epsilon = 0.05,
  l0 = 0.5, l1 = 0.5, seed = NULL, ncores = detectCores() - 1, ...)

Arguments

x

a design matrix

y

a vector containing binary labels 0 and 1

method

base classification method

  • logistic: Logistic regression. glm function with family = 'binomial'

  • penlog: Penalized logistic regression with LASSO penalty. glmnet in glmnet package

  • svm: Support Vector Machines. svm in e1071 package

  • randomforest: Random Forest. randomForest in randomForest package

  • lda: Linear Discriminant Analysis. lda in MASS package

  • slda: Sparse Linear Discriminant Analysis with LASSO penalty.

  • nb: Naive Bayes. naiveBayes in e1071 package

  • nnb: Nonparametric Naive Bayes. naive_bayes in naivebayes package

  • ada: Ada-Boost. ada in ada package

p.adjust.methods

multiple testing adjustment method. See p.adjust.methods

N

a positive integer indicating the maximum number of marginal features to be kept

alpha

a numeric scalar between 0 and 1 indicating the population type I error control

delta

a numeric scalar between 0 and 1 indicating the violation rate. Default: 0.05

epsilon

a numeric scalar between 0 and 1 indicating the significance level. Default: 0.05

l0

a numeric scalar between 0 and 1 indicating the proportion of leave-out class 0 data points. Default: 0.5

l1

a numeric scalar between 0 and 1 indicating the proportion of leave-out class 1 data points. Default: 0.5

seed

random seed

ncores

a positive integer that specifies the number of cores for computing. Default: number of cores - 1.

...

additional argument for base classification methods.

Value

npMarginal returns a list with the following components:

alpha

user-specified population type I error control

delta

user-specified violation rate

epsilon

user-specified significance level

N

user-specified maximum feature number

pval.unadj

a vector of unadjusted p-values

pval.adj

a vector of adjusted p-values

p.adjust.methods

user-specified p-value adjusting method

feature

features that pass marginal feature screening

Author(s)

Yiling Chen, yiling0210@ucla.edu

References

FILL HERE

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
### example 1 ###
n = 1000; p = 1000
y = rbinom(n,size = 1, p =0.5)
x = matrix(NA, nrow =n,ncol =p)
x1 = rmvnorm(sum(y==1), mean = rep(1, p), sigma = diag(1,p))
x0 = rmvnorm(sum(y==0), mean = seq(from = -5, to = 4, length.out = p), sigma = diag(1,p))
x[y==1,] = x1
x[y==0,] = x0
table(y)
temp1 = npMarginal(x,y,method = "logistic",
N = 50,
p.adjust.methods = 'BH',
alpha =  0.05,
epsilon = 0.1,
l0 = 0.5,
l1 = 0.5)
plot(temp1$pval.unadj, ylim = c(0,1))
abline(h = epsilon)

### example 2 ###
n = 1000; p = 1000
y = rbinom(n,size = 1, p =0.5)
x = matrix(NA, nrow =n,ncol =p)
x1 = rmvnorm(sum(y==1), mean = rep(1, p), sigma = diag(1,p))
x0 = rmvnorm(sum(y==0), mean = seq(from = -2, to = 1, length.out = p), sigma = diag(1,p))
x[y==1,] = x1
x[y==0,] = x0
table(y)
temp2 = npMarginal(x,y,method = "logistic",
N = 50,
p.adjust.methods = 'BH',
alpha =  0.05,
epsilon = 0.1,
l0 = 0.5,
l1 = 0.5)
plot(temp2$pval.unadj, ylim = c(0,1))
abline(h = epsilon)

yiling0210/NPCriterion documentation built on May 10, 2019, 1:25 p.m.