dot-cv_binomialRF: random forest feature selection based on binomial exact test

Description Usage Arguments Value References Examples

Description

cv.binomialRF is the cross-validated form of the binomialRF, where K-fold crossvalidation is conducted to assess the feature's significance. Using the cvFolds=K parameter, will result in a K-fold cross-validation where the data is 'chunked' into K-equally sized groups and then the averaged result is returned.

Usage

1
2
.cv_binomialRF(X, y, cvFolds = 5, fdr.threshold = 0.05,
  fdr.method = "BY", ntrees = 2000, keep.both = FALSE)

Arguments

X

design matrix

y

class label

cvFolds

how many times should we perform cross-validation

fdr.threshold

fdr.threshold for determining which set of features are significant

fdr.method

how should we adjust for multiple comparisons (i.e., p.adjust.methods =c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY","fdr", "none"))

ntrees

how many trees should be used to grow the randomForest? (Defaults to 5000)

keep.both

should we keep the naive binomialRF as well as the correlated adjustment

Value

a data.frame with 4 columns: Feature Name, cross-validated average for Frequency Selected, CV Median (Probability of Selecting it randomly), CV Median(Adjusted P-value based on fdr.method), and averaged number of times selected as signficant.

References

Zaim, SZ; Kenost, C.; Lussier, YA; Zhang, HH. binomialRF: Scalable Feature Selection and Screening for Random Forests to Identify Biomarkers and Their Interactions, bioRxiv, 2019.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
set.seed(324)

###############################
### Generate simulation data
###############################

X = matrix(rnorm(1000), ncol=10)
trueBeta= c(rep(10,5), rep(0,5))
z = 1 + X %*% trueBeta
pr = 1/(1+exp(-z))
y = as.factor(rbinom(100,1,pr))

###############################
### Run cross-validation
###############################

binomialRF documentation built on March 26, 2020, 5:13 p.m.