binomialRF: random forest feature selection based on binomial exact test In binomialRF: Binomial Random Forest Feature Selection

Description

`binomialRF` is the R implementation of the feature selection algorithm by (Zaim 2019)

Usage

 ```1 2 3 4``` ```binomialRF(X,y, fdr.threshold = .05,fdr.method = 'BY', ntrees = 2000, percent_features = .5, keep.both=FALSE, user_cbinom_dist=NULL, sampsize=round(nrow(X)*.63)) ```

Arguments

 `X` design matrix `y` class label `fdr.threshold` fdr.threshold for determining which set of features are significant `fdr.method` how should we adjust for multiple comparisons (i.e., `p.adjust.methods` =c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY","fdr", "none")) `ntrees` how many trees should be used to grow the `randomForest`? `percent_features` what percentage of L do we subsample at each tree? Should be a proportion between (0,1) `keep.both` should we keep the naive binomialRF as well as the correlated adjustment `user_cbinom_dist` insert either a pre-specified correlated binomial distribution or calculate one via the R package `correlbinom`. `sampsize` how many samples should be included in each tree in the randomForest

Value

a data.frame with 4 columns: Feature Name, Frequency Selected, Probability of Selecting it randomly, Adjusted P-value based on `fdr.method`

References

Zaim, SZ; Kenost, C.; Lussier, YA; Zhang, HH. binomialRF: Scalable Feature Selection and Screening for Random Forests to Identify Biomarkers and Their Interactions, bioRxiv, 2019.

Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28``` ```set.seed(324) ############################### ### Generate simulation data ############################### X = matrix(rnorm(1000), ncol=10) trueBeta= c(rep(10,5), rep(0,5)) z = 1 + X %*% trueBeta pr = 1/(1+exp(-z)) y = as.factor(rbinom(100,1,pr)) ############################### ### Run binomialRF ############################### require(correlbinom) rho = 0.33 ntrees = 250 cbinom = correlbinom(rho, successprob = calculateBinomialP(10, .5), trials = ntrees, precision = 1024, model = 'kuk') binom.rf <-binomialRF(X,y, fdr.threshold = .05,fdr.method = 'BY', ntrees = ntrees,percent_features = .5, keep.both=FALSE, user_cbinom_dist=cbinom, sampsize=round(nrow(X)*rho)) print(binom.rf) ```

binomialRF documentation built on March 26, 2020, 5:13 p.m.