rf.blind: Non-random cross validation for random forest

View source: R/rf.blind.R

rf.blindR Documentation

Non-random cross validation for random forest

Description

Grows multiple random forests with non-random cross validation: the algorithm is trained on a specific part of the dataset, and predictions are done on another part of the dataset.

Usage

rf.blind(
  tab,
  treat,
  train.id,
  mtry = NULL,
  n.tree = 500,
  n.forest = 10,
  importance_p = F,
  seed = NULL
)

Arguments

tab

An abundance or presence absence table containing samples in columns and OTUs/ASV in rows.

treat

A boolean vector containing the class identity of each sample, i.e. the treatment to predict. This means that you should pick a class as a reference for the calculation of precision and sensitivity.

train.id

A charecter sting to be searched in samples names that will be used for training. Can be a regular expression. Can alernatively be a boolean vector saying wether or not each sample is part of the training dataset(TRUE for training samples, FALSE for testing samples), or a character vector containing the training sample names.

mtry

The mtry parameter to be passed to the ranger function. See ranger documentation for details.

n.tree

The number of tree to grow. The default is 500.

n.forest

The number of forests to grow. The default is 10.

importance_p

A boolean defining if the p-value should be computed for the importance of variable. For now, the importance is the Gini index, and the p-value is estimated by permutation with the Altmann method. See ranger documentation for details

seed

A number to set the seed before growing the forest. Only meaningful if n.forest == 1. The default is NULL.

Value

A list object containing:

  • a summary table with the number of true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN) the error rate, the sensistivity TP/(TP + FN), and the precision TP/(TP + FP)

  • The confusion matrix

  • n.forest tables containing Gini index for each variable in each of the n.forest grown forests. This index gives the variable importance for classification.


marccamb/optiranger documentation built on June 19, 2024, 9:18 a.m.