test_spectra | R Documentation |
Wrapper that trains models based spectral data to predict reference values and reports model performance statistics
test_spectra(
train.data,
num.iterations,
test.data = NULL,
pretreatment = 1,
k.folds = 5,
proportion.train = 0.7,
tune.length = 50,
model.method = "pls",
best.model.metric = "RMSE",
stratified.sampling = TRUE,
cv.scheme = NULL,
trial1 = NULL,
trial2 = NULL,
trial3 = NULL,
split.test = FALSE,
seed = 1,
verbose = TRUE,
wavelengths = deprecated(),
preprocessing = deprecated(),
output.summary = deprecated(),
rf.variable.importance = deprecated()
)
train.data |
|
num.iterations |
Number of training iterations to perform |
test.data |
|
pretreatment |
Number or list of numbers 1:13 corresponding to desired pretreatment method(s):
|
k.folds |
Number indicating the number of folds for k-fold cross-validation during model training. Default is 5. |
proportion.train |
Fraction of samples to include in the training set. Default is 0.7. |
tune.length |
Number delineating search space for tuning of the PLSR
hyperparameter |
model.method |
Model type to use for training. Valid options include:
|
best.model.metric |
Metric used to decide which model is best. Must be either "RMSE" or "Rsquared" |
stratified.sampling |
If |
cv.scheme |
A cross validation (CV) scheme from Jarquín et al., 2017.
Options for
|
trial1 |
|
trial2 |
|
trial3 |
|
split.test |
boolean that allows for a fixed training set and a split
test set. Example// train model on data from two breeding programs and a
stratified subset (70%) of a third and test on the remaining samples
(30%) of the third. If |
seed |
Integer to be used internally as input for |
verbose |
If |
wavelengths |
DEPRECATED |
preprocessing |
DEPRECATED please use
|
output.summary |
DEPRECATED |
rf.variable.importance |
DEPRECATED
|
Calls pretreat_spectra
, format_cv
,
and train_spectra
functions.
list
of 5 objects:
'model.list' is a list
of trained model objects, one for each
pretreatment method specified by the pretreatment
argument.
Each model is trained with all rows of df
.
'summary.model.performance' is a data.frame
containing summary
statistics across all model training iterations and pretreatments.
See below for a description of the summary statistics provided.
'model.performance' is a data.frame
containing performance
statistics for each iteration of model training separately (see below).
'predictions' is a data.frame
containing both reference and
predicted values for each test set entry in each iteration of
model training.
'importance' is a data.frame
containing variable importance
results for each wavelength at each iteration of model training.
If model.method
is not "pls" or "rf", this list item is NULL
.
'summary.model.performance' and 'model.performance' data.frames
summary statistics include:
Tuned parameters depending on the model algorithm:
Best.n.comp, the best number of components
Best.ntree, the best number of trees in an RF model
Best.mtry, the best number of variables to include at every decision point in an RF model
RMSECV, the root mean squared error of cross-validation
R2cv, the coefficient of multiple determination of cross-validation for PLSR models
RMSEP, the root mean squared error of prediction
R2p, the squared Pearson’s correlation between predicted and observed test set values
RPD, the ratio of standard deviation of observed test set values to RMSEP
RPIQ, the ratio of performance to interquartile difference
CCC, the concordance correlation coefficient
Bias, the average difference between the predicted and observed values
SEP, the standard error of prediction
R2sp, the squared Spearman’s rank correlation between predicted and observed test set values
Jenna Hershberger jmh579@cornell.edu
library(magrittr)
ikeogu.2017 %>%
dplyr::rename(reference = DMC.oven,
unique.id = sample.id) %>%
dplyr::select(unique.id, reference, dplyr::starts_with("X")) %>%
na.omit() %>%
test_spectra(
train.data = .,
tune.length = 3,
num.iterations = 3,
pretreatment = 1
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.