save_model: Save spectral prediction model and model performance...
In GoreLab/waves: Vis-NIR Spectral Analysis Wrapper

save_model

R Documentation

Save spectral prediction model and model performance statistics

Description

Given a set of pretreatment methods, saves the best spectral prediction model and model statistics to model.save.folder as model.name.Rds and model.name_stats.csv respectively. If only one pretreatment method is supplied, results from that method are stored.

Usage

save_model(
  df,
  write.model = TRUE,
  pretreatment = 1,
  model.save.folder = NULL,
  model.name = "PredictionModel",
  best.model.metric = "RMSE",
  k.folds = 5,
  proportion.train = 0.7,
  tune.length = 50,
  model.method = "pls",
  num.iterations = 10,
  stratified.sampling = TRUE,
  cv.scheme = NULL,
  trial1 = NULL,
  trial2 = NULL,
  trial3 = NULL,
  seed = 1,
  verbose = TRUE,
  save.model = deprecated(),
  wavelengths = deprecated(),
  autoselect.preprocessing = deprecated(),
  preprocessing.method = deprecated()
)

Arguments

`df`	`data.frame` object. First column contains unique identifiers, second contains reference values, followed by spectral columns. Include no other columns to right of spectra! Column names of spectra must start with "X" and reference column must be named "reference"
`write.model`	If `TRUE`, the trained model will be saved in .Rds format to the location specified by `model.save.folder`. If `FALSE`, the best model will be output by the function but will not save to a file. Default is `TRUE`.
`pretreatment`	Number or list of numbers 1:13 corresponding to desired pretreatment method(s): Raw data (default) Standard normal variate (SNV) SNV and first derivative SNV and second derivative First derivative Second derivative Savitzky–Golay filter (SG) SNV and SG Gap-segment derivative (window size = 11) SG and first derivative (window size = 5) SG and first derivative (window size = 11) SG and second derivative (window size = 5) SG and second derivative (window size = 11)
`model.save.folder`	Path to folder where model will be saved. If not provided, will save to working directory.
`model.name`	Name that model will be saved as in `model.save.folder`. Default is "PredictionModel".
`best.model.metric`	Metric used to decide which model is best. Must be either "RMSE" or "Rsquared"
`k.folds`	Number indicating the number of folds for k-fold cross-validation during model training. Default is 5.
`proportion.train`	Fraction of samples to include in the training set. Default is 0.7.
`tune.length`	Number delineating search space for tuning of the PLSR hyperparameter `ncomp`. Must be set to 5 when using the random forest algorithm (`model.method == rf`). Default is 50.
`model.method`	Model type to use for training. Valid options include: "pls": Partial least squares regression (Default) "rf": Random forest "svmLinear": Support vector machine with linear kernel "svmRadial": Support vector machine with radial kernel
`num.iterations`	Number of training iterations to perform
`stratified.sampling`	If `TRUE`, training and test sets will be selected using stratified random sampling. This term is only used if `test.data == NULL`. Default is `TRUE`.
`cv.scheme`	A cross validation (CV) scheme from Jarquín et al., 2017. Options for `cv.scheme` include: "CV1": untested lines in tested environments "CV2": tested lines in tested environments "CV0": tested lines in untested environments "CV00": untested lines in untested environments
`trial1`	`data.frame` object that is for use only when `cv.scheme` is provided. Contains the trial to be tested in subsequent model training functions. The first column contains unique identifiers, second contains genotypes, third contains reference values, followed by spectral columns. Include no other columns to right of spectra! Column names of spectra must start with "X", reference column must be named "reference", and genotype column must be named "genotype".
`trial2`	`data.frame` object that is for use only when `cv.scheme` is provided. This data.frame contains a trial that has overlapping genotypes with `trial1` but that were grown in a different site/year (different environment). Formatting must be consistent with `trial1`.
`trial3`	`data.frame` object that is for use only when `cv.scheme` is provided. This data.frame contains a trial that may or may not contain genotypes that overlap with `trial1`. Formatting must be consistent with `trial1`.
`seed`	Integer to be used internally as input for `set.seed()`. Only used if `stratified.sampling = TRUE`. In all other cases, seed is set to the current iteration number. Default is 1.
`verbose`	If `TRUE`, the number of rows removed through filtering will be printed to the console. Default is `TRUE`.
`save.model`	DEPRECATED `save.model = FALSE` is no longer supported; this function will always return a saved model.
`wavelengths`	DEPRECATED `wavelengths` is no longer supported; this information is now inferred from `df` column names
`autoselect.preprocessing`	DEPRECATED `autoselect.preprocessing = FALSE` is no longer supported. If multiple pretreatment methods are supplied, the best will be automatically selected as the model to be saved.
`preprocessing.method`	DEPRECATED `preprocessing.method` has been renamed "pretreatment"

Details

Wrapper that uses pretreat_spectra, format_cv, and train_spectra functions.

Value

List of model stats (in data.frame) and trained model object. If the parameter write.model is TRUE, both objects are saved to model.save.folder. To use the optimally trained model for predictions, use tuned parameters from $bestTune.

Author(s)

Jenna Hershberger jmh579@cornell.edu

Examples


library(magrittr)
test.model <- ikeogu.2017 %>%
  dplyr::filter(study.name == "C16Mcal") %>%
  dplyr::rename(reference = DMC.oven,
                unique.id = sample.id) %>%
  dplyr::select(unique.id, reference, dplyr::starts_with("X")) %>%
  na.omit() %>%
  save_model(
    df = .,
    write.model = FALSE,
    pretreatment = 1:13,
    model.name = "my_prediction_model",
    tune.length = 3,
    num.iterations = 3
  )
summary(test.model$best.model)
test.model$best.model.stats

GoreLab/waves documentation built on April 15, 2024, 3:28 p.m.