train_frm | R Documentation |
Trains a new model from molecule SMILES to predict retention times (RT) using the specified method.
train_frm(
df = read_rp_xlsx(),
method = "lasso",
verbose = 1,
nfolds = 5,
nw = 1,
degree_polynomial = 1,
interaction_terms = FALSE,
rm_near_zero_var = TRUE,
rm_na = TRUE,
rm_ns = FALSE,
seed = NULL
)
df |
A dataframe with columns "NAME", "RT", "SMILES" and optionally a set of chemical descriptors. If no chemical descriptors are provided, they are calculated using the function |
method |
A string representing the prediction algorithm. Either "lasso", "ridge" or "gbtree". |
verbose |
A logical value indicating whether to print progress messages. |
nfolds |
An integer representing the number of folds for cross validation. |
nw |
An integer representing the number of workers for parallel processing. |
degree_polynomial |
An integer representing the degree of the polynomial. Polynomials up to the specified degree are included in the model. |
interaction_terms |
A logical value indicating whether to include interaction terms in the model. |
rm_near_zero_var |
A logical value indicating whether to remove near zero variance predictors. Setting this to TRUE can cause the CV results to be overoptimistic, as the variance filtering is done on the whole dataset, i.e. information from the test folds is used for feature selection. |
rm_na |
A logical value indicating whether to remove NA values. Setting this to TRUE can cause the CV results to be overoptimistic, as the variance filtering is done on the whole dataset, i.e. information from the test folds is used for feature selection. |
rm_ns |
A logical value indicating whether to remove chemical descriptors that were considered as not suitable for linear regression based on previous analysis of an independent dataset. See |
seed |
An integer value to set the seed for random number generation to allow for reproducible results. |
Setting rm_near_zero_var
and/or rm_na
to TRUE can cause the CV results to be overoptimistic, as the predictor filtering is done on the whole dataset, i.e. information from the test folds is used for feature selection.
A trained FastRet model.
system.time(m <- train_frm(RP[1:80, ], method = "lasso", nfolds = 2, nw = 1, verbose = 0))
# For the sake of a short runtime, only the first 80 rows of the RP dataset
# are used in this example. In practice, you should always use the entire
# training dataset for model training.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.