randomForest_tune: Determine mtry for Random Forest Classifier Using K-Fold...
In HAN-Siyu/ncProR: Predicting Long non-coding RNA-Protein Interaction

randomForest_tune

R Documentation

Determine mtry for Random Forest Classifier Using K-Fold Cross Validation

Description

Determine mtry for Random Forest Classifier Using K-Fold Cross Validation

Usage

randomForest_tune(
  datasets = list(),
  label.col = 1,
  positive.class = NULL,
  folds.num = 10,
  ntree = 3000,
  mtry.ratios = c(0.1, 0.2, 0.4, 0.6, 0.8),
  seed = 1,
  return.model = TRUE,
  parallel.cores = 2,
  ...
)

Arguments

`datasets`	should be a list containing one or several input datasets. If input several datasets, stratified cross validation will be performed. See examples.
`label.col`	an integer. Column number of the label.
`positive.class`	`NULL` or string. Which class is the positive class? Should be one of the classes in label column. The first class in label column will be selected as the positive class if leave `positive.class = NULL`.
`folds.num`	an integer. Number of folds. Default `10` for 10-fold cross validation.
`ntree`	integer, number of trees to grow. See `randomForest`. Default: `3000`.
`mtry.ratios`	(only when `mode = "retrain"`) used to indicate the ratios of `mtry` when tuning the random forest classifier. `mtry` = ratio of mtry * number of features Default: `c(0.1, 0.2, 0.4, 0.6, 0.8)`.
`seed`	random seed for data splitting. Integer.
`return.model`	logical. If `TRUE`, the function will return a random forest model built with the optimal `ntree`. The training set is the combination of all input datasets.
`parallel.cores`	an integer specifying the number of cores for parallel computation. Default: `2`. Set `parallel.cores = -1` to run with all the cores. `parallel.cores` should be == -1 or >= 1.
`...`	other parameters (except `ntree` and `mtry`) passed to `randomForest` function.

Value

If return.model = TRUR, the function returns a random forest model. If FALSE, the function returns the optimal ntree and the performance.

Examples


# Following codes only show how to use this function
# and cannot reflect the genuine performance of tools or classifiers.

data(demoPositiveSeq)
data(demoNegativeSeq)

RNA.positive <- demoPositiveSeq$RNA.positive
Pro.positive <- demoPositiveSeq$Pro.positive
RNA.negative <- demoNegativeSeq$RNA.negative
Pro.negative <- demoNegativeSeq$Pro.negative

dataPositive <- featureFreq(seqRNA = RNA.positive, seqPro = Pro.positive,
                            label = "Interact", featureMode = "conc",
                            computePro = "DeNovo", k.Pro = 3, k.RNA = 2,
                            normalize = "none", parallel.cores = 2)

dataNegative <- featureFreq(seqRNA = RNA.negative, seqPro = Pro.negative,
                            label = "Non.Interact", featureMode = "conc",
                            computePro = "DeNovo", k.Pro = 3, k.RNA = 2,
                            normalize = "none", parallel.cores = 2)

dataset <- rbind(dataPositive, dataNegative)

Perf_tune <- randomForest_tune(datasets = list(dataset), label.col = 1,
                               positive.class = "Interact", folds.num = 5,
                               ntree = 150, seed = 123,
                               return.model = TRUE, parallel.cores = 2,
                               importance = TRUE)

# if you have more than one input dataset,
# use "datasets = list(dataset1, dataset2, dataset3)".

HAN-Siyu/ncProR documentation built on Nov. 3, 2023, 12:08 a.m.