View source: R/train_spectra.R
train_spectra | R Documentation |
Trains spectral prediction models using one of several algorithms and sampling procedures.
train_spectra(
df,
num.iterations,
test.data = NULL,
k.folds = 5,
proportion.train = 0.7,
tune.length = 50,
model.method = "pls",
best.model.metric = "RMSE",
stratified.sampling = TRUE,
cv.scheme = NULL,
trial1 = NULL,
trial2 = NULL,
trial3 = NULL,
split.test = FALSE,
seed = 1,
verbose = TRUE,
save.model = deprecated(),
rf.variable.importance = deprecated(),
output.summary = deprecated(),
return.model = deprecated()
)
df |
|
num.iterations |
Number of training iterations to perform |
test.data |
|
k.folds |
Number indicating the number of folds for k-fold cross-validation during model training. Default is 5. |
proportion.train |
Fraction of samples to include in the training set. Default is 0.7. |
tune.length |
Number delineating search space for tuning of the PLSR
hyperparameter |
model.method |
Model type to use for training. Valid options include:
|
best.model.metric |
Metric used to decide which model is best. Must be either "RMSE" or "Rsquared" |
stratified.sampling |
If |
cv.scheme |
A cross validation (CV) scheme from Jarquín et al., 2017.
Options for
|
trial1 |
|
trial2 |
|
trial3 |
|
split.test |
boolean that allows for a fixed training set and a split
test set. Example// train model on data from two breeding programs and a
stratified subset (70%) of a third and test on the remaining samples
(30%) of the third. If |
seed |
Integer to be used internally as input for |
verbose |
If |
save.model |
DEPRECATED |
rf.variable.importance |
DEPRECATED
|
output.summary |
DEPRECATED |
return.model |
DEPRECATED |
list of the following:
model
is a model object trained with all rows of df
.
summary.model.performance
is a data.frame
with model
performance statistics in summary format (2 rows, one with mean and one
with standard deviation of all training iterations).
full.model.performance
is a data.frame
with model
performance statistics in long format
(number of rows = num.iterations
)
predictions
is a data.frame
containing predicted values
for each test set entry at each iteration of model training.
importance
is a data.frame
that contains variable
importance for each wavelength. Only available for model.method
options "rf" and "pls".
Included summary statistics:
Tuned parameters depending on the model algorithm:
Best.n.comp, the best number of components
Best.ntree, the best number of trees in an RF model
Best.mtry, the best number of variables to include at every decision point in an RF model
RMSECV, the root mean squared error of cross-validation
R2cv, the coefficient of multiple determination of cross-validation for PLSR models
RMSEP, the root mean squared error of prediction
R2p, the squared Pearson’s correlation between predicted and observed test set values
RPD, the ratio of standard deviation of observed test set values to RMSEP
RPIQ, the ratio of performance to interquartile difference
CCC, the concordance correlation coefficient
Bias, the average difference between the predicted and observed values
SEP, the standard error of prediction
R2sp, the squared Spearman’s rank correlation between predicted and observed test set values
Jenna Hershberger jmh579@cornell.edu
library(magrittr)
ikeogu.2017 %>%
dplyr::filter(study.name == "C16Mcal") %>%
dplyr::rename(reference = DMC.oven,
unique.id = sample.id) %>%
dplyr::select(unique.id, reference, dplyr::starts_with("X")) %>%
na.omit() %>%
train_spectra(
df = .,
tune.length = 3,
num.iterations = 3,
best.model.metric = "RMSE",
stratified.sampling = TRUE
) %>%
summary()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.