randomForest_RFE: Feature Selection Using Random Forest Classifier and...

View source: R/Modelling.R

randomForest_RFER Documentation

Feature Selection Using Random Forest Classifier and Recursive Feature Elimination

Description

Feature Selection Using Random Forest Classifier and Recursive Feature Elimination

Usage

randomForest_RFE(
  datasets = list(),
  label.col = 1,
  positive.class = NULL,
  featureNum.range = NULL,
  folds.num = 10,
  ntree = 1500,
  seed = 1,
  parallel.cores = 2,
  ...
)

Arguments

datasets

should be a list containing one or several input datasets. See examples.

label.col

an integer. The number of label column.

positive.class

NULL or string. Which class is the positive class? Should be one of the classes in label column. The first class in label column will be selected as the positive class if leave positive.class = NULL.

featureNum.range

is the range of feature number in each RFE iteration. For example, if the original feature set has 100 features and featureNum.range = c(10, 50, 80), 20 low-ranked features will be eliminated in the first iteration, and 80 features are used to build model in the second iteration (All features are used in the first iteration). If leave NULL, RFE will iterate five times according to feature set, i.e. c(1, 26, 50, 75, 100) for 100-dimension feature set.

folds.num

an integer. Number of folds. Default 10 for 10-fold cross validation.

ntree

parameter for random forest. Default: 1500. See randomForest.

seed

random seed for data splitting. Integer.

parallel.cores

an integer specifying the number of cores for parallel computation. Default: 2. Set parallel.cores = -1 to run with all the cores. parallel.cores should be == -1 or >= 1.

...

other parameters passed to randomForest function.

Value

The function returns a list containing importance scores and relevant performance of the features.

See Also

randomForest_CV, randomForest_tune, randomForest

Examples


# Following codes only show how to use this function
# and cannot reflect the genuine performance of tools or classifiers.

data(demoPositiveSeq)
data(demoNegativeSeq)

RNA.positive <- demoPositiveSeq$RNA.positive
Pro.positive <- demoPositiveSeq$Pro.positive
RNA.negative <- demoNegativeSeq$RNA.negative
Pro.negative <- demoNegativeSeq$Pro.negative

dataPositive <- featureFreq(seqRNA = RNA.positive, seqPro = Pro.positive,
                            label = "Interact", featureMode = "conc",
                            computePro = "DeNovo", k.Pro = 3, k.RNA = 2,
                            normalize = "none", parallel.cores = 2)

dataNegative <- featureFreq(seqRNA = RNA.negative, seqPro = Pro.negative,
                            label = "Non.Interact", featureMode = "conc",
                            computePro = "DeNovo", k.Pro = 3, k.RNA = 2,
                            normalize = "none", parallel.cores = 2)

dataset <- rbind(dataPositive, dataNegative)

Perf_RFE <- randomForest_RFE(datasets = list(dataset), label.col = 1,
                             positive.class = "Interact",
                             featureNum.range = c(20, 50, 100),
                             folds.num = 5, ntree = 50, seed = 123,
                             parallel.cores = 2, mtry = 20)

# if you have more than one input dataset,
# use "datasets = list(dataset1, dataset2, dataset3)".


HAN-Siyu/ncProR documentation built on Nov. 3, 2023, 12:08 a.m.