rf_standard: Random Forest Wrapper
In SarahAsbury/BioDataTools:

View source: R/rf_wrapper.R

rf_standard

R Documentation

Random Forest Wrapper

Description

Convenient wrapper function that runs Random Forest pipeline. This pipeline include 5-fold cross-validation and hyper-parameter tuning for mtry and ntree. Number of train/test splits and train/validation ratios are customizable by user.

Usage

rf_standard(
  rf.type,
  vpred,
  df,
  dir,
  nsets = 10,
  split.param = c(train.ratio = 0.8),
  mtry = NA,
  ntree = (1:10) * 500,
  top.variables = NA,
  rf.param = c(dataframe.name, predictors.name),
  varimp.param = c(selection_type = NA, metric = NA, xlab = NA),
  density.param = c(scale = 0.8, ncol = 2, tsize = 10, xlab = NA),
  extract.names.df = NA,
  experiment.note = NA
)

Arguments

`rf.type`	One of: `"class"` for classification trees or `"reg"` for regression trees.
`vpred`	String. Name of response variable.
`df`	Input dataframe.
`dir`	Directory path for export. String.
`nsets`	(Optional) Number of train/validation sets to generate, Default: 10
`split.param`	(Optional) Proportion of sample assigned to training set. Range from 0 to 1. For example 0.8 indicates 80% of samples assigned to training set for a 80:20 train:test split, Default: c(train.ratio = 0.8).
`mtry`	(Optional) Range of mtry values to try for Random Forest hyperparameter tuning. If NA, will use mtry.guide to select optimal mtry based on tree type and number of predictor variables, Default: NA.
`ntree`	(Optional) Number of trees to try during Random Forest hyperparameter tuning, Default: (1:10) * 500.
`top.variables`	(Optional) Number of top important predictor variables to plot, Default: NA.
`rf.param`	Dataframe object as a string. Predictor variables name (any description) as a string.
`varimp.param`	(Optional) Parameters to pass to variable importance plot. Default: c(selection_type = NA, metric = NA, xlab = NA).
`density.param`	(Optional) Parameters to pass to density plot, Default: c(scale = 0.8, ncol = 2, tsize = 10, xlab = NA).
`extract.names.df`	(Optional) Dataframe. A df must be provided if `density` param xlab = "extract". 1st columns is how the predictor variables should appear in the density plots (e.g IL-6 Concentration). 2nd column is how the predictor variable appears in input `df` (e.g IL.6.Concentration), Default: NA.
`experiment.note`	(Optional) User input human-readable note that will be sent to output log. May be used to log why/what is being run., Default: NA.

Details

DETAILS

Value

Exports random forest results to sub-folders within dir

Examples

rf_standard(rf.type = "class", vpred = "Genotype", df = blood %>% select(-c(Sex, AnimalID)), dir = "/Users/Documents/experiment", experiment.note = "Predict mouse genotype from immune populations. No genotype excluded from dataframe. Exclude sex metadata.", rf.param = c(dataframe.name = "blood-allgenotypes", predictors.name = "immune"))

SarahAsbury/BioDataTools documentation built on Feb. 5, 2024, 4:01 p.m.