findTargetFeatures: Find and integrate target features in each ROI

View source: R/findTargetFeatures.R

findTargetFeaturesR Documentation

Find and integrate target features in each ROI

Description

For each ROI, fit a curve and integrate the largest feature in the box. Each entry in ROIsDataPoints must match the corresponding row in ROI. The curve shape to employ for fitting can be changed with curveModel while fitting parameters can be changed with params (list with one param per ROI window). rtMin and rtMax are established at 0.5 outward (the window is the ROI width); if after 8 iterations rtMin or rtMax is not found, NA is returned and the peak fit rejected. peakArea is calculated from rtMin to rtMax. peakAreaRaw is calculated from rtMin to rtMax but using the raw data points instead of the modelled line-shape. mz is the weighted (by intensity) average mz of datapoints falling into the rtMin to rtMax range, mzMin and mzMax are the minimum and maxmimum mass in these range. If rtMin or rtMax falls outside of ROI (extracted scans), mzMin or mzMax are returned as the input ROI limits and mz is an approximation on the datapoints available (if no scan of the ROI fall between rtMin/rtMax, mz would be NA, the peak is rejected). If any of the two following ratio are superior to maxApexResidualRatio, the fit is rejected: 1) ratio of fit residuals at the apex (predicted apex fit intensity vs measured apex intensity: fit overshoots the apex), 2) ratio of predicted apex fit intensity vs maximum measured peak intensity (fit misses the real apex in the peak).

Usage

findTargetFeatures(
    ROIsDataPoints,
    ROI,
    curveModel = "skewedGaussian",
    params = "guess",
    sampling = 250,
    maxApexResidualRatio = 0.2,
    verbose = FALSE,
    ...
)

Arguments

ROIsDataPoints

(list) A list (one entry per ROI window) of data.frame with signal as row and retention time ('rt'), mass ('mz') and intensity ('int) as columns. Must match each row of ROI.

ROI

(data.frame) A data.frame of compounds to target as rows. Columns: rtMin (float in seconds), rtMax (float in seconds), mzMin (float), mzMax (float)

curveModel

(str) Name of the curve model to fit (currently skewedGaussian and emgGaussian)

params

(list or str) Either 'guess' for automated parametrisation or list (one per ROI windows) of 'guess' or list of curve fit parameters

sampling

(int) Number of points to employ when subsampling the fittedCurve (rt, rtMin, rtMax, integral calculation)

maxApexResidualRatio

(float) Ratio of maximum allowed fit residual at the peak apex, compared to the fit max intensity. (e.g. 0.2 for a maximum residual of 20% of apex intensity)

verbose

(bool) If TRUE message the time taken and number of features found

...

Passes arguments to fitCurve to alter peak fitting (params)

Details

## Examples cannot be computed as the function is not exported: ## Load data library(faahKO) library(MSnbase) netcdfFilePath <- system.file('cdf/KO/ko15.CDF', package = 'faahKO') raw_data <- MSnbase::readMSData(netcdfFilePath,centroided=TRUE,mode='onDisk')

## targetFeatTable targetFeatTable <- data.frame(matrix(vector(), 2, 8, dimnames=list(c(), c('cpdID','cpdName','rtMin','rt','rtMax','mzMin', 'mz','mzMax'))), stringsAsFactors=FALSE) targetFeatTable[1,] <- c('ID-1', 'Cpd 1', 3310., 3344.888, 3390., 522.194778, 522.2, 522.205222) targetFeatTable[2,] <- c('ID-2', 'Cpd 2', 3280., 3385.577, 3440., 496.195038, 496.2, 496.204962) targetFeatTable[,3:8] <- vapply(targetFeatTable[,3:8], as.numeric, FUN.VALUE=numeric(2))

ROIsPt <- extractSignalRawData(raw_data, rt=targetFeatTable[,c('rtMin','rtMax')], mz=targetFeatTable[,c('mzMin','mzMax')], verbose=TRUE) # Reading data from 2 windows

foundPeaks <- findTargetFeatures(ROIsPt, targetFeatTable, verbose=TRUE) # Warning: rtMin/rtMax outside of ROI; datapoints cannot be used for # mzMin/mzMax calculation, # approximate mz and returning ROI$mzMin and ROI$mzMax for ROI #1 # Found 2/2 features in 0.07 secs

foundPeaks # $peakTable # found rtMin rt rtMax mzMin mz mzMax peakArea # 1 TRUE 3309.759 3346.828 3385.410 522.1948 522.2 522.2052 26133727 # 2 TRUE 3345.377 3386.529 3428.279 496.2000 496.2 496.2000 35472141 # peakAreaRaw maxIntMeasured maxIntPredicted # 1 26071378 889280 901015.8 # 2 36498367 1128960 1113576.7 # # $curveFit # $curveFit[[1]] # $amplitude # [1] 162404.8 # # $center # [1] 3341.888 # # $sigma # [1] 0.07878613 # # $gamma # [1] 0.00183361 # # $fitStatus # [1] 2 # # $curveModel # [1] 'skewedGaussian' # # attr(,'class') # [1] 'peakPantheR_curveFit' # # $curveFit[[2]] # $amplitude # [1] 199249.1 # # $center # [1] 3382.577 # # $sigma # [1] 0.07490442 # # $gamma # [1] 0.00114719 # # $fitStatus # [1] 2 # # $curveModel # [1] 'skewedGaussian' # # attr(,'class') # [1] 'peakPantheR_curveFit'

Value

A list: list()$peakTable (data.frame) with targeted features as rows and peak measures as columns (see Details), list()$curveFit (list) a list of peakPantheR_curveFit or NA for each ROI.

Details:

The returned data.frame is structured as follow:

found was the peak found
rt retention time of peak apex (sec)
rtMin leading edge of peak retention time (sec) determined at 0.5% of apex intensity
rtMax trailing edge of peak retention time (sec) determined at 0.5% of apex intensity
mz weighted (by intensity) mean of peak m/z across scans
mzMin m/z peak minimum (between rtMin, rtMax)
mzMax m/z peak maximum (between rtMin, rtMax)
peakArea integrated peak area
peakAreaRaw integrated peak area from raw data points
maxIntMeasured maximum peak intensity in raw data
maxIntPredicted maximum peak intensity based on curve fit (at apex)

phenomecentre/peakPantheR documentation built on Feb. 29, 2024, 9:07 p.m.