model_gen_tidy: Generate a machine learning model using tidy models
In ColinChisholm/pemgeneratr: Predictive Ecosystem Mapping

model_gen_tidy

R Documentation

Generate a machine learning model using tidy models

Description

This function takes in all the data needed to produce machine learning model. Inputs are handed to a RMD report/ script. Outputs include the markdown report, the cross validation object, and a binary model (RDS) that can then be used to predict on new data.

Usage

model_gen_tidy(
  trDat,
  outDir = ".",
  mname = "Model",
  target = "target",
  target2 = NA,
  tid = NA,
  field_transect = NA,
  slice = NA,
  ds_ratio = NA,
  sm_ratio = NA,
  rseed = NA,
  infiles = NA,
  mmu = NA
)

Arguments

`trDat`	Is a dataframe that contains the model training data. The response variable should be one of the columns.
`outDir`	Highly recommended to be set as an absolute directory. This defaults to the project's root directory OR where the RMD script is saved. Additional products generated from the associated 'model_gen_tidy.Rmd“ markdown script will also be saved to this dir.
`mname`	Name for this model run. Will be used to name outputs.
`target`	The name of the response variable in the trDat data frame.
`target2`	A second target
`tid`	Transect ID ... need to clarify how this is different from `field transect`
`field_transect`	A transect ID ... need to clarify how this is different from `tid`
`slice`	Column ID for slices from Conditioned Latin Hyper Sampling
`ds_ratio`	Covariate/predictor variable balancing: downsample proportion
`sm_ratio`	Covariate/predictor variable balancing: Smote proportion
`rseed`	Optional random number seed.
`infiles`	Simply for reporting – to specify what files were used in the creation of trDat.
`mmu`	Map unit (e.g. BC BEC subzone). This may be a column in the input data and will allow for the processing of multiple subzones in one model run.