randomForest_RFE: Feature Selection Using Random Forest Classifier and...
In HAN-Siyu/ncProR: Predicting Long non-coding RNA-Protein Interaction

randomForest_RFE

R Documentation

Feature Selection Using Random Forest Classifier and Recursive Feature Elimination

Description

Feature Selection Using Random Forest Classifier and Recursive Feature Elimination

Usage

randomForest_RFE(
  datasets = list(),
  label.col = 1,
  positive.class = NULL,
  featureNum.range = NULL,
  folds.num = 10,
  ntree = 1500,
  seed = 1,
  parallel.cores = 2,
  ...
)

Arguments

`datasets`	should be a list containing one or several input datasets. See examples.
`label.col`	an integer. The number of label column.
`positive.class`	`NULL` or string. Which class is the positive class? Should be one of the classes in label column. The first class in label column will be selected as the positive class if leave `positive.class = NULL`.
`featureNum.range`	is the range of feature number in each RFE iteration. For example, if the original feature set has 100 features and `featureNum.range = c(10, 50, 80)`, 20 low-ranked features will be eliminated in the first iteration, and 80 features are used to build model in the second iteration (All features are used in the first iteration). If leave `NULL`, RFE will iterate five times according to feature set, i.e. `c(1, 26, 50, 75, 100)` for 100-dimension feature set.
`folds.num`	an integer. Number of folds. Default `10` for 10-fold cross validation.
`ntree`	parameter for random forest. Default: 1500. See `randomForest`.
`seed`	random seed for data splitting. Integer.
`parallel.cores`	an integer specifying the number of cores for parallel computation. Default: `2`. Set `parallel.cores = -1` to run with all the cores. `parallel.cores` should be == -1 or >= 1.
`...`	other parameters passed to `randomForest` function.

Value

The function returns a list containing importance scores and relevant performance of the features.

Examples


# Following codes only show how to use this function
# and cannot reflect the genuine performance of tools or classifiers.

data(demoPositiveSeq)
data(demoNegativeSeq)

RNA.positive <- demoPositiveSeq$RNA.positive
Pro.positive <- demoPositiveSeq$Pro.positive
RNA.negative <- demoNegativeSeq$RNA.negative
Pro.negative <- demoNegativeSeq$Pro.negative

dataPositive <- featureFreq(seqRNA = RNA.positive, seqPro = Pro.positive,
                            label = "Interact", featureMode = "conc",
                            computePro = "DeNovo", k.Pro = 3, k.RNA = 2,
                            normalize = "none", parallel.cores = 2)

dataNegative <- featureFreq(seqRNA = RNA.negative, seqPro = Pro.negative,
                            label = "Non.Interact", featureMode = "conc",
                            computePro = "DeNovo", k.Pro = 3, k.RNA = 2,
                            normalize = "none", parallel.cores = 2)

dataset <- rbind(dataPositive, dataNegative)

Perf_RFE <- randomForest_RFE(datasets = list(dataset), label.col = 1,
                             positive.class = "Interact",
                             featureNum.range = c(20, 50, 100),
                             folds.num = 5, ntree = 50, seed = 123,
                             parallel.cores = 2, mtry = 20)

# if you have more than one input dataset,
# use "datasets = list(dataset1, dataset2, dataset3)".

HAN-Siyu/ncProR documentation built on Nov. 3, 2023, 12:08 a.m.