randomForest_tune: Determine mtry for Random Forest Classifier Using K-Fold...

View source: R/Modelling.R

randomForest_tuneR Documentation

Determine mtry for Random Forest Classifier Using K-Fold Cross Validation

Description

Determine mtry for Random Forest Classifier Using K-Fold Cross Validation

Usage

randomForest_tune(
  datasets = list(),
  label.col = 1,
  positive.class = NULL,
  folds.num = 10,
  ntree = 3000,
  mtry.ratios = c(0.1, 0.2, 0.4, 0.6, 0.8),
  seed = 1,
  return.model = TRUE,
  parallel.cores = 2,
  ...
)

Arguments

datasets

should be a list containing one or several input datasets. If input several datasets, stratified cross validation will be performed. See examples.

label.col

an integer. Column number of the label.

positive.class

NULL or string. Which class is the positive class? Should be one of the classes in label column. The first class in label column will be selected as the positive class if leave positive.class = NULL.

folds.num

an integer. Number of folds. Default 10 for 10-fold cross validation.

ntree

integer, number of trees to grow. See randomForest. Default: 3000.

mtry.ratios

(only when mode = "retrain") used to indicate the ratios of mtry when tuning the random forest classifier. mtry = ratio of mtry * number of features Default: c(0.1, 0.2, 0.4, 0.6, 0.8).

seed

random seed for data splitting. Integer.

return.model

logical. If TRUE, the function will return a random forest model built with the optimal ntree. The training set is the combination of all input datasets.

parallel.cores

an integer specifying the number of cores for parallel computation. Default: 2. Set parallel.cores = -1 to run with all the cores. parallel.cores should be == -1 or >= 1.

...

other parameters (except ntree and mtry) passed to randomForest function.

Value

If return.model = TRUR, the function returns a random forest model. If FALSE, the function returns the optimal ntree and the performance.

See Also

randomForest_RFE, randomForest_CV, randomForest

Examples


# Following codes only show how to use this function
# and cannot reflect the genuine performance of tools or classifiers.

data(demoPositiveSeq)
data(demoNegativeSeq)

RNA.positive <- demoPositiveSeq$RNA.positive
Pro.positive <- demoPositiveSeq$Pro.positive
RNA.negative <- demoNegativeSeq$RNA.negative
Pro.negative <- demoNegativeSeq$Pro.negative

dataPositive <- featureFreq(seqRNA = RNA.positive, seqPro = Pro.positive,
                            label = "Interact", featureMode = "conc",
                            computePro = "DeNovo", k.Pro = 3, k.RNA = 2,
                            normalize = "none", parallel.cores = 2)

dataNegative <- featureFreq(seqRNA = RNA.negative, seqPro = Pro.negative,
                            label = "Non.Interact", featureMode = "conc",
                            computePro = "DeNovo", k.Pro = 3, k.RNA = 2,
                            normalize = "none", parallel.cores = 2)

dataset <- rbind(dataPositive, dataNegative)

Perf_tune <- randomForest_tune(datasets = list(dataset), label.col = 1,
                               positive.class = "Interact", folds.num = 5,
                               ntree = 150, seed = 123,
                               return.model = TRUE, parallel.cores = 2,
                               importance = TRUE)

# if you have more than one input dataset,
# use "datasets = list(dataset1, dataset2, dataset3)".


HAN-Siyu/ncProR documentation built on Nov. 3, 2023, 12:08 a.m.