binomialRF: random forest feature selection based on binomial exact test

Description Usage Arguments Value References Examples

View source: R/binomialRF.R

Description

binomialRF is the R implementation of the feature selection algorithm by (Zaim 2019)

Usage

1
2
3
4
binomialRF(X,y, fdr.threshold = .05,fdr.method = 'BY',
                      ntrees = 2000, percent_features = .5,
                      keep.both=FALSE, user_cbinom_dist=NULL,
                      sampsize=round(nrow(X)*.63))

Arguments

X

design matrix

y

class label

fdr.threshold

fdr.threshold for determining which set of features are significant

fdr.method

how should we adjust for multiple comparisons (i.e., p.adjust.methods =c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY","fdr", "none"))

ntrees

how many trees should be used to grow the randomForest?

percent_features

what percentage of L do we subsample at each tree? Should be a proportion between (0,1)

keep.both

should we keep the naive binomialRF as well as the correlated adjustment

user_cbinom_dist

insert either a pre-specified correlated binomial distribution or calculate one via the R package correlbinom.

sampsize

how many samples should be included in each tree in the randomForest

Value

a data.frame with 4 columns: Feature Name, Frequency Selected, Probability of Selecting it randomly, Adjusted P-value based on fdr.method

References

Zaim, SZ; Kenost, C.; Lussier, YA; Zhang, HH. binomialRF: Scalable Feature Selection and Screening for Random Forests to Identify Biomarkers and Their Interactions, bioRxiv, 2019.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
set.seed(324)

###############################
### Generate simulation data
###############################

X = matrix(rnorm(1000), ncol=10)
trueBeta= c(rep(10,5), rep(0,5))
z = 1 + X %*% trueBeta
pr = 1/(1+exp(-z))
y = as.factor(rbinom(100,1,pr))

###############################
### Run binomialRF
###############################
require(correlbinom)

rho = 0.33
ntrees = 250
cbinom = correlbinom(rho, successprob =  calculateBinomialP(10, .5), trials = ntrees, 
                               precision = 1024, model = 'kuk')

binom.rf <-binomialRF(X,y, fdr.threshold = .05,fdr.method = 'BY',
                      ntrees = ntrees,percent_features = .5,
                      keep.both=FALSE, user_cbinom_dist=cbinom,
                      sampsize=round(nrow(X)*rho))

print(binom.rf)

binomialRF documentation built on March 26, 2020, 5:13 p.m.