preprocess_spectra: Preprocessing spectra for q2e estimation

View source: R/preprocess_data.R

preprocess_spectraR Documentation

Preprocessing spectra for q2e estimation

Description

Performs smoothening, baseline removal and peak detection on MALDI samples. From the peaks, isotopic peaks for a list of peptides are extracted.

Usage

preprocess_spectra(
  indir,
  metadata,
  make_plots = FALSE,
  peptides_user = NULL,
  smooth_wma_hws = 4,
  smooth_sg_hws = 6,
  iterations = 50,
  halfWindowSize = 20,
  snr = 2,
  k = 0L,
  threshold = 0.33,
  local_bg = FALSE,
  mass_range = 100,
  bg_cutoff = 0.5,
  l_cutoff = 1e-08,
  tolerance = 0.4,
  ppm = 50,
  n_isopeaks = 5,
  min_isopeaks = 4,
  ncores = NULL,
  chunk_size = 40
)

Arguments

indir

Folder containing spectra in mzML format.

metadata

Data frame with spectra metadata with at least file column. Ideally metadata has been cleaned before with MALDIzooMS::clean_metadata

smooth_wma_hws

Half-window size for WeightedMovingAverage smoothing method

smooth_sg_hws

Half-window size for SavitzkyGolay smoothing method

iterations

Iterations parameter for baseline detection.

halfWindowSize

Half-window size parameter for local maximum detection.

snr

Signal-to-noise threshold above which peaks are considered

k

k parameter for MsCoreUtils::refineCentroids()

threshold

threshold parameter for MsCoreUtils::refineCentroids()

local_bg

Whether to further to clean peaks of lists by modelling the local background noise. See MALDIzooMS::peaks_local_bg. Ideally should work with a snr threshold of 0. mass_range, bg_cutoff and l_cutoff only applied if local_bg is TRUE

mass_range

Mass window to both sides of a peak to be considered for backgroun modelling

bg_cutoff

The peaks within the mass range with intensity below the bg_cutoff quantile are considered for background modelling. bg_cutoff=1 keeps all peaks and bg_cutoff=0.5 would only keep the bottom half.

l_cutoff

Likelihood threshold or p-value. Peaks with a probability of being modelled as background noise higher than this are filtered out.

tolerance

Mass tolerance in Da between mono_masses and subsequent isotopic peaks and detected peaks. See MsCoreUtils::closest

ppm

Parts-per-million added to tolerance. See MsCoreUtils::closest

n_isopeaks

Number of isotopic peaks to pick. Default is 5 and the maximum permitted.

min_isopeaks

If less than min_isopeaks consecutive (about 1 Da difference) isotopic peaks are detected, the whole isotopic envelope is discarded. Default is 4

ncores

Number of cores used by the Spectra::MsBackendMzR backend in Spectra::peaksData

mono_masses

Array with the peptides monoisotopics masses

Details

The default peptides are the ones from Nair et al. (2022). The paper contains the details on the preprocessing procedure.

Value

A list of dataframes, 1 per sample. Each dataframe has 3 columns, m/z, intensity and signal-to-noise ratio for each of the n_isopeaks from each peptide. Missing peaks are NAs.

References

Nair, B. et al. (2022) ‘Parchment Glutamine Index (PQI): A novel method to estimate glutamine deamidation levels in parchment collagen obtained from low-quality MALDI-TOF data’, bioRxiv. doi:10.1101/2022.03.13.483627.


ismaRP/MALDIpqi documentation built on Dec. 28, 2024, 1:08 p.m.