Description Usage Arguments Value Examples
The function acts as a wrapper to several functions from the caret package to train and optimize a binary predictive peak QC model for the provided training data. Twenty percent of the training dataset is randomly selected as validation set and left out from the training process to estimate the performance of the models on unseen data. The features are mean centered and scaled by diving by the standard deviation before being used for training. Repeated 10-fold cross validation (3 repeats) is applied to the remainder of the training set to minimize over-fitting. The model offering the highest accuracy is used and returned by the function.
1 2 3 |
data.merged |
A dataframe that contains peak identifiers (File,FileName,PeptideModifiedSequence,FragmentIon,IsotopeLabelType,PrecursorCharge and ProductCharge), the calculated QC metrics as well as the Status assigned by the expert analyst to each transition pair. data.merged is the output of MakeDataSet function (output$data.merged). |
response.var |
This variable indicates the name of the column that stores the "ok" and "flag" labels for the transition pairs in the training data. |
description.columns |
If the input dataframe contains columns corresponding to description variables (such as Notes), it should be indicated here. Description and identifier columns will be removed from the data before training the model. |
method |
The machine learning algorithm for training the classifier. The algorithm can be chosen from the list of available packages in caret https://topepo.github.io/caret/available-models.html. The following have been tested: RRF, regLogistic, svmLinear3, svmPoly, kknn. Before using TrainQCModel with any of these packages, you will need to first install the machine learning package using the install.packages command. |
tuneGrid |
Use this parameter of you want to specify tuneGrid for the caret train method. Otherwise, set tuneGrid to NULL. See the caret package help for more details: https://topepo.github.io/caret/model-training-and-tuning.html. |
random.seed |
To fix the random seed for splitting the dataset into training and validation and the data splitting for cross validation, provide a vector of length 2 e.g. random.seed = c(1000,2000). This is particularly useful if you want to compare multiple models with the same data split. |
export.model |
A Logical parameter to indicate whether the model should be saved. If export.model = TRUE the model will be saved in model.path. |
model.path |
Path to the directory where the model will be saved if export.model = TRUE. |
A list with the following objects: model: Trained model to flag peaks with poor chromatography or interference. performance.testing: Confusion matrix of applying the model on the unseen validation data (20 model.file.path: If export.model = TRUE and the model is saved, the path and file name for the model is stored in this field.
1 2 3 4 5 6 7 8 9 | rrf.grid <- expand.grid(mtry = c(2,10),
coefReg = c(0.5,1),
coefImp = c(0))
model.rrf <- TrainQCModel(data.set.CSF$data.merged,
response.var = c("Status"),
description.columns = c("Notes"),
method = "RRF",
tuneGrid = rrf.grid,
random.seed = c(100,200))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.