do_findChromPeaks_matchedFilter: Core API function for matchedFilter peak detection
In xcms: LC-MS and GC-MS Data Analysis

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/do_findChromPeaks-functions.R

This function identifies peaks in the chromatographic time domain as described in [Smith 2006]. The intensity values are binned by cutting The LC/MS data into slices (bins) of a mass unit (binSize m/z) wide. Within each bin the maximal intensity is selected. The peak detection is then performed in each bin by extending it based on the steps parameter to generate slices comprising bins current_bin - steps +1 to current_bin + steps - 1. Each of these slices is then filtered with matched filtration using a second-derative Gaussian as the model peak shape. After filtration peaks are detected using a signal-to-ration cut-off. For more details and illustrations see [Smith 2006].

do_findChromPeaks_matchedFilter(
  mz,
  int,
  scantime,
  valsPerSpect,
  binSize = 0.1,
  impute = "none",
  baseValue,
  distance,
  fwhm = 30,
  sigma = fwhm/2.3548,
  max = 5,
  snthresh = 10,
  steps = 2,
  mzdiff = 0.8 - binSize * steps,
  index = FALSE,
  sleep = 0
)

`mz`	Numeric vector with the individual m/z values from all scans/ spectra of one file/sample.
`int`	Numeric vector with the individual intensity values from all scans/spectra of one file/sample.
`scantime`	Numeric vector of length equal to the number of spectra/scans of the data representing the retention time of each scan.
`valsPerSpect`	Numeric vector with the number of values for each spectrum.
`binSize`	`numeric(1)` specifying the width of the bins/slices in m/z dimension.
`impute`	Character string specifying the method to be used for missing value imputation. Allowed values are `"none"` (no linear interpolation), `"lin"` (linear interpolation), `"linbase"` (linear interpolation within a certain bin-neighborhood) and `"intlin"`. See `imputeLinInterpol` for more details.
`baseValue`	The base value to which empty elements should be set. This is only considered for `method = "linbase"` and corresponds to the `profBinLinBase`'s `baselevel` argument.
`distance`	For `method = "linbase"`: number of non-empty neighboring element of an empty element that should be considered for linear interpolation. See details section for more information.
`fwhm`	`numeric(1)` specifying the full width at half maximum of matched filtration gaussian model peak. Only used to calculate the actual sigma, see below.
`sigma`	`numeric(1)` specifying the standard deviation (width) of the matched filtration model peak.
`max`	`numeric(1)` representing the maximum number of peaks that are expected/will be identified per slice.
`snthresh`	`numeric(1)` defining the signal to noise ratio cutoff.
`steps`	`numeric(1)` defining the number of bins to be merged before filtration (i.e. the number of neighboring bins that will be joined to the slice in which filtration and peak detection will be performed).
`mzdiff`	`numeric(1)` representing the minimum difference in m/z dimension required for peaks with overlapping retention times; can be negative to allow overlap. During peak post-processing, peaks defined to be overlapping are reduced to the one peak with the largest signal.
`index`	`logical(1)` specifying whether indicies should be returned instead of values for m/z and retention times.
`sleep`	`numeric(1)` defining the number of seconds to wait between iterations. Defaults to `sleep = 0`. If `> 0` a plot is generated visualizing the identified chromatographic peak. Note: this argument is for backward compatibility only and will be removed in future.

The intensities are binned by the provided m/z values within each spectrum (scan). Binning is performed such that the bins are centered around the m/z values (i.e. the first bin includes all m/z values between min(mz) - bin_size/2 and min(mz) + bin_size/2).

For more details on binning and missing value imputation see binYonX and imputeLinInterpol methods.

A matrix, each row representing an identified chromatographic peak, with columns:

mz: Intensity weighted mean of m/z values of the peak across scans.
mzmin: Minimum m/z of the peak.
mzmax: Maximum m/z of the peak.
rt: Retention time of the peak's midpoint.
rtmin: Minimum retention time of the peak.
rtmax: Maximum retention time of the peak.
into: Integrated (original) intensity of the peak.
intf: Integrated intensity of the filtered peak.
maxo: Maximum intensity of the peak.
maxf: Maximum intensity of the filtered peak.
i: Rank of peak in merged EIC (<= max).
sn: Signal to noise ratio of the peak

This function exposes core peak detection functionality of the matchedFilter method. While this function can be called directly, users will generally call the corresponding method for the data object instead (e.g. the link{findPeaks.matchedFilter} method).

Colin A Smith, Johannes Rainer

Colin A. Smith, Elizabeth J. Want, Grace O'Maille, Ruben Abagyan and Gary Siuzdak. "XCMS: Processing Mass Spectrometry Data for Metabolite Profiling Using Nonlinear Peak Alignment, Matching, and Identification" Anal. Chem. 2006, 78:779-787.

binYonX for a binning function, imputeLinInterpol for the interpolation of missing values. matchedFilter for the standard user interface method.

Other core peak detection functions: do_findChromPeaks_centWaveWithPredIsoROIs(), do_findChromPeaks_centWave(), do_findChromPeaks_massifquant(), do_findPeaks_MSW()

## Load the test file
data(faahko_sub)
## Update the path to the files for the local system
dirname(faahko_sub) <- system.file("cdf/KO", package = "faahKO")

## Subset to one file and restrict to a certain retention time range
data <- filterRt(filterFile(faahko_sub, 1), c(2500, 3000))

## Get m/z and intensity values
mzs <- mz(data)
ints <- intensity(data)

## Define the values per spectrum:
valsPerSpect <- lengths(mzs)

res <- do_findChromPeaks_matchedFilter(mz = unlist(mzs), int = unlist(ints),
    scantime = rtime(data), valsPerSpect = valsPerSpect)
head(res)