eic.pred: Internal function: calculate the score for each EIC based on...

View source: R/eic.pred.R

eic.predR Documentation

Internal function: calculate the score for each EIC based on prediction of match status.

Description

This function uses predictive models to evaluate the data features, and give scores to every EIC, which serves as the basis for EIC selection.

Usage

eic.pred(eic.rec, known.mz, mass.matched = NA, to.use = 10, do.plot = FALSE, match.tol.ppm = 5, do.grp.reduce = TRUE, remove.bottom = 5, max.fpr = 0.3, min.tpr = 0.8)

Arguments

eic.rec

The matrix of data features from every EIC. Each row is an EIC. Each column is a data feature value.

known.mz

The m/z values of the known metabolic features.

mass.matched

An indicator vector. "1" means the corresponding EIC has an m/z matched to known features. The default is NA, in which case the matching is done inside this function.

to.use

The maximum number of data features to use in the predictive models.

do.plot

Whether diagnostic plots would be generated.

match.tol.ppm

The tolerance level in the m/z match, at ppm scale.

do.grp.reduce

Whether to reduce the data features first by reducing each group of similar features into one.

remove.bottom

The number of worst performing data features to remove before model building. If true, the removal is done based on single predictor ROC analysis.

max.fpr

The threshold for selecting unmatched EICs. Each EIC is assigned an FPR value based on the final prediction model. Those with FPR smaller than this threshold will be selected. If a vector is provided, the first one will be used. But all FPR values will also be returned. So other functions will be able to make selections based on other threshold values.

min.tpr

The threshold for selecting matched EICs. Each EIC is assigned an TPR value based on the final prediction model. Those with TPR larger than this threshold will be selected. If a vector is provided, the first one will be used. But all TPR values will also be returned. So other functions will be able to make selections based on other threshold values.

Details

The function first subsample the EICs to balance the unmatched/matched. Then it randomly split the data into training and testing set. Combinations of feature ranking and predictive models are used, and their performance guaged using the testing set. The overall best model is selected, and the EICs each receive a score based on this model.

Although there is a single scoring system for all EICs, those matched are treated differently than unmatched, because we have higher confidence in them being real metabolites. The matched are selected using the "min.tpr" threshold, to ensure the majority of them enter next step. Those unmatched are selected using the "max.fpr" threshold.

Value

A list item is returned.

chosen

An indicator vector. "1" means the EIC is selected; "0" means unselected. When multiple min.tpr and/or max.fpr are provided, this vector corresponds to the combination of the first min.tpr and max.fpr.

fpr

The vector of FPR values, each value corresponds to the FPR at the cutoff of the specific EIC.

tpr

The vector of TPR values, each value corresponds to the TPR at the cutoff of the specific EIC.

matched

An indicator vector. "1" means matched to known features. "0" means unmatched.

pred.performance

Prediction performance of all models tested.

feature.rank.method

Which method is used for ranking features.

model

Which prediction model is used.

feature importance

The importance score of all data features generated by the feature ranking method.

used.features

The names of the features used in the final model.

final.auc

The AUC of the selected model.

Author(s)

Tianwei Yu <tianwei.yu@emory.edu>

References

Bioinformatics. 30(20): 2941-2948.

See Also

semi.sup.learn, eic.qual, eic.disect


yufree/apLCMS documentation built on May 19, 2024, 1:22 p.m.