rf_standard: Random Forest Wrapper

View source: R/rf_wrapper.R

rf_standardR Documentation

Random Forest Wrapper

Description

Convenient wrapper function that runs Random Forest pipeline. This pipeline include 5-fold cross-validation and hyper-parameter tuning for mtry and ntree. Number of train/test splits and train/validation ratios are customizable by user.

Usage

rf_standard(
  rf.type,
  vpred,
  df,
  dir,
  nsets = 10,
  split.param = c(train.ratio = 0.8),
  mtry = NA,
  ntree = (1:10) * 500,
  top.variables = NA,
  rf.param = c(dataframe.name, predictors.name),
  varimp.param = c(selection_type = NA, metric = NA, xlab = NA),
  density.param = c(scale = 0.8, ncol = 2, tsize = 10, xlab = NA),
  extract.names.df = NA,
  experiment.note = NA
)

Arguments

rf.type

One of: "class" for classification trees or "reg" for regression trees.

vpred

String. Name of response variable.

df

Input dataframe.

dir

Directory path for export. String.

nsets

(Optional) Number of train/validation sets to generate, Default: 10

split.param

(Optional) Proportion of sample assigned to training set. Range from 0 to 1. For example 0.8 indicates 80% of samples assigned to training set for a 80:20 train:test split, Default: c(train.ratio = 0.8).

mtry

(Optional) Range of mtry values to try for Random Forest hyperparameter tuning. If NA, will use mtry.guide to select optimal mtry based on tree type and number of predictor variables, Default: NA.

ntree

(Optional) Number of trees to try during Random Forest hyperparameter tuning, Default: (1:10) * 500.

top.variables

(Optional) Number of top important predictor variables to plot, Default: NA.

rf.param

Dataframe object as a string. Predictor variables name (any description) as a string.

varimp.param

(Optional) Parameters to pass to variable importance plot. Default: c(selection_type = NA, metric = NA, xlab = NA).

density.param

(Optional) Parameters to pass to density plot, Default: c(scale = 0.8, ncol = 2, tsize = 10, xlab = NA).

extract.names.df

(Optional) Dataframe. A df must be provided if density param xlab = "extract". 1st columns is how the predictor variables should appear in the density plots (e.g IL-6 Concentration). 2nd column is how the predictor variable appears in input df (e.g IL.6.Concentration), Default: NA.

experiment.note

(Optional) User input human-readable note that will be sent to output log. May be used to log why/what is being run., Default: NA.

Details

DETAILS

Value

Exports random forest results to sub-folders within dir

Examples

rf_standard(rf.type = "class", vpred = "Genotype", df = blood %>% select(-c(Sex, AnimalID)), dir = "/Users/Documents/experiment", experiment.note = "Predict mouse genotype from immune populations. No genotype excluded from dataframe. Exclude sex metadata.", rf.param = c(dataframe.name = "blood-allgenotypes", predictors.name = "immune"))

SarahAsbury/BioDataTools documentation built on Feb. 5, 2024, 4:01 p.m.