rf.kfold: k-fold cross validation for random forest

View source: R/rf.kfold.R

rf.kfoldR Documentation

k-fold cross validation for random forest

Description

Splits the dataset in k and grows k random forests for classification, using alternatively each of the k parts of the dataset to make predictions, while the other k-1 parts are used for the training.

Usage

rf.kfold(
  tab,
  treat,
  k.fold = 5,
  mtry = NULL,
  n.tree = 500,
  importance_p = F,
  seed = NULL
)

Arguments

tab

An abundance table containing samples in columns and OTUs/ASV in rows.

treat

A boolean vector containing the class identity of each sample, i.e. the treatment to predict. This means that you should pick a class as a reference for the calculation of precision and sensitivity.

k.fold

A number of fold to be applied for k-fold cross-valisation.

mtry

The mtry parameter to be passed to the ranger function. See ranger documentation for details.

n.tree

The number of tree to grow in each of the k forests. The default is 500.

importance_p

A boolean defining if the p-value should be computed for the importance of variable. For now, the importance is the Gini index, and the p-value is estimated by permutation with the Altmann method. See ranger documentation for details

seed

A number to set the seed before before growing each forest. The default is NULL.

Value

A list object containing:

  • a summary table with the number of true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN) the error rate, the sensistivity TP/(TP + FN), and the precision TP/(TP + FP)

  • The confusion matrix

  • n.forest tables containing Gini index for each variable in each of the n.forest grown forests. This index gives the variable importance for classification.

Examples

# Coming soon!


marccamb/optiranger documentation built on June 19, 2024, 9:18 a.m.