ProfileCleanUp: Reduce redundancy of the profile
In acinostroza/TargetSearch: A package for the analysis of GC-MS metabolite profiling data

ProfileCleanUp

R Documentation

Reduce redundancy of the profile

Description

This function reduces/removes redundancy in a profile.

Usage

ProfileCleanUp(Profile, timeSplit=500, r_thres=0.95, minPairObs=5,
    prioritization=c('mass','score'), corMass=1, score=0,
    show=c('unidentified','knowns','full'))

Arguments

`Profile`	A `tsProfile` object. See `Profile`.
`timeSplit`	A RI window.
`r_thres`	A correlation threshold.
`minPairObs`	Minimum number of pair observations. Correlations between two variables are computed using all complete pairs of observations in those variables. If the number of observations is too small, you may get high correlations values just by chance, so this parameters is used to avoid that. Cannot be set lower than 5.
`prioritization`	Selects whether the metabolite suggestion should be based on the number of correlation masses (`mass`) or the score (`score`).
`corMass`	Metabolites with a number of correlation masses lower than `score` will be marked as 'Unidentified RI'
`score`	Metabolites with a score lower than `score` will be marked as unidentified.
`show`	A character vector. If `unidentified`, all non-redundant metabolites will be returned; if `knowns`, only returns those metabolites with correlation masses and score greater than the given values; and if `full`, it shows all redundant metabolites, which may be useful to retrieve the data from misidentified metabolites.

Details

Metabolites that are inside a timeSplit window will be correlated to see whether the metabolites are potentially the same or not, by using r_thres as a cutoff. If so, the best candidate will be chosen according to the value of prioritization: If 'mass', then metabolites will be suggested based on number of correlating masses, and if 'score', then the score will be used. Metabolites that don't have al least corMass correlating masses and score score will be marked as 'unidentified' and not will be suggested, unless all the metabolites in group are unidentified.

For example, suppose that three metabolites A (CM=3, S=900), B (CM=6, S=700), C (CM=5, S=800) correlate within the same time group, where CM is the number of correlating masses and S is the score.

If prioritization='mass', corMass=3, score=650, then the suggested order is B, C, A.
If prioritization='mass', corMass=3, score=750, then the suggested order is C, A, B.
If prioritization='mass', corMass=3, score=850, then the suggested order is A, B, C.
If prioritization='score', corMass=3, score=650, then the suggested order is A, C, B.
If prioritization='score', corMass=4, score=650, then the suggested order is C, B, A.
If prioritization='score', corMass=4, score=850, then the suggested order is C, A, B.

Note that by choosing prioritization='mass', score=0, and corMass=1 you will get the former behavior (TargetSearch <= 1.6).

Value

A tsProfile object with a non-redundant profile of the masses that were searched and correlated, and intensity and RI matrices of the correlating masses.

`slot "Info"`	A data frame with a profile of all masses that correlate and the metabolites that correlate in a `timeSplit` window.
`slot "profInt"`	A matrix with the averaged intensities of the correlating masses.
`slot "profRI"`	A matrix with the averaged RI of the correlating masses.
`slot "Intensity"`	A list containing peak-intensity matrices, one matrix per metabolite.
`slot "RI"`	A list containing RI matrices, one matrix per metabolite.

Author(s)

Alvaro Cuadros-Inostroza, Matthew Hannah, Henning Redestig

Examples

# load example data
require(TargetSearchData)
data(TSExample)

RI.path <- tsd_data_path()
refLibrary <- ImportLibrary(tsd_file_path("library.txt"))
# update RI file path
RIpath(sampleDescription) <- RI.path
# Import Library
refLibrary        <- ImportLibrary(tsd_file_path('library.txt'))
# update median RI
refLibrary        <- medianRILib(sampleDescription, refLibrary)
# get the sample RI
corRI             <- sampleRI(sampleDescription, refLibrary, r_thres = 0.95)
# obtain the peak Intensities of all the masses in the library
peakData          <- peakFind(sampleDescription, refLibrary, corRI)
metabProfile      <- Profile(sampleDescription, refLibrary, peakData, r_thres = 0.95) 

# here we use the metabProfile previously calculated and return a "cleaned" profile.
metabProfile.clean <- ProfileCleanUp(metabProfile, timeSplit = 500,
                      r_thres = 0.95) 

# Different cutoffs could be specified
metabProfile.clean <- ProfileCleanUp(metabProfile, timeSplit = 1000,
                      r_thres = 0.9)

acinostroza/TargetSearch documentation built on July 5, 2025, 1:19 a.m.