rf.opti.mtry.taxo: Random forest optimisation

View source: R/rf.opti.mtry.taxo.R

rf.opti.mtry.taxoR Documentation

Random forest optimisation

Description

Runs random forest classification with several taxonomic level and mtry parameters and performs k-fold or blind cross-validation.

Usage

rf.opti.mtry.taxo(
  tab,
  tax.table,
  treat,
  n.mtry = 5,
  mtry = function(x) i.mtry * x/n.mtry * 0.5,
  tax.lvl = c("ASV", "genus", "family", "order", "class"),
  cross.val = "kfold",
  train.id = NA,
  n.tree = 100,
  cross.param = 5,
  seed = 1409,
  RDSfile = NULL
)

Arguments

tab

An abundance table containing samples in columns and OTUs/ASV in rows.

tax.table

A table containing the taxonomy of each ASV/OTU.

treat

A boolean vector containing the class identity of each sample, i.e. the treatment to predict. This means that you should pick a class as a reference for the calculation of precision and sensitivity.

n.mtry

The number of mtry parameters to be tested. Default is 5.

mtry

A function of x = ncol(tab), the number of variables (i.e. ASV or OTU) and n in 1:n.mtry. Default is function(x) n*x/n.mtry.

tax.lvl

A character vector containing the names of the taxonomic levels to be used for asv table aggregation. Default is c("ASV", "genus", "family", "order", "class").

cross.val

The type of cross validation to perform. Possible values are "blind" or "kfold" (Default).

train.id

A string that matches the name of samples tu be used for training. Only meaningful for cross.val = "blind".

n.tree

The number of tree to grow for each forest. Default is 100.

cross.param

The parameter needed for cross validation: the number of folds for cross.val = "kfold" or the number of forests to grow for cross.val = "blind". Default is 5.

seed

The seed to set before growing each forest, and before sampling of training dataset in cross.val = "kfold". Set to NA for no seeding. Default is 1409.

RDSfile

A string contaning the name of the RDS file to save the results. Default is NULL and results are not saved.

Value

Returns a list of dataframes corresponding to the different taxonomic levels. Each dataframe contains the confusion matrix, sensitivity, precision and error rate obtained for each value of the mtry parameter. Mean value and standard deviation are computed over the results of the cross.param forests grown.


marccamb/optiranger documentation built on June 19, 2024, 9:18 a.m.