preprocess_spectra: Preprocessing spectra for q2e estimation

View source: R/preprocess_data.R

preprocess_spectraR Documentation

Preprocessing spectra for q2e estimation

Description

Performs smoothening, baseline removal and peak detection on MALDI samples. From the peaks, isotopic peaks for a list of peptides are extracted.

Usage

preprocess_spectra(
  indir = NULL,
  metadata = NULL,
  mzml_files = NULL,
  spectrum_name_file = FALSE,
  sps_mzr = NULL,
  make_plots = FALSE,
  peptides_user = NULL,
  smooth_wma_hws = 4,
  smooth_sg_hws = 6,
  iterations = 50,
  halfWindowSize = 20,
  snr = 2,
  k = 0L,
  threshold = 0.33,
  local_bg = FALSE,
  mass_range = 100,
  bg_cutoff = 0.5,
  l_cutoff = 1e-08,
  tolerance = 0.4,
  ppm = 50,
  n_isopeaks = 5,
  min_isopeaks = 4,
  norm_func = NULL,
  q2e = NULL,
  ncores = NULL,
  chunk_size = 40,
  verbose = FALSE
)

Arguments

indir

Folder containing spectra in mzML format.

metadata

Data frame with spectra metadata with at least file column. Ideally metadata has been cleaned before with MALDIzooMS::clean_metadata

mzml_files

Paths to mzML files

sps_mzr

Spectra object

smooth_wma_hws

Half-window size for WeightedMovingAverage smoothing method

smooth_sg_hws

Half-window size for SavitzkyGolay smoothing method

iterations

Iterations parameter for baseline detection.

halfWindowSize

Half-window size parameter for local maximum detection.

snr

Signal-to-noise threshold above which peaks are considered

k

k parameter for MsCoreUtils::refineCentroids()

threshold

threshold parameter for MsCoreUtils::refineCentroids()

local_bg

Whether to further to clean peaks of lists by modelling the local background noise. See MALDIzooMS::peaks_local_bg. Ideally should work with a snr threshold of 0. mass_range, bg_cutoff and l_cutoff only applied if local_bg is TRUE

mass_range

Mass window to both sides of a peak to be considered for backgroun modelling

bg_cutoff

The peaks within the mass range with intensity below the bg_cutoff quantile are considered for background modelling. bg_cutoff=1 keeps all peaks and bg_cutoff=0.5 would only keep the bottom half.

l_cutoff

Likelihood threshold or p-value. Peaks with a probability of being modelled as background noise higher than this are filtered out.

tolerance

Mass tolerance in Da between mono_masses and subsequent isotopic peaks and detected peaks. See MsCoreUtils::closest

ppm

Parts-per-million added to tolerance. See MsCoreUtils::closest

n_isopeaks

Number of isotopic peaks to pick. Default is 5 and the maximum permitted.

min_isopeaks

If less than min_isopeaks consecutive (about 1 Da difference) isotopic peaks are detected, the whole isotopic envelope is discarded. Default is 4

norm_func

Function to normalize the isotopic distribution

q2e

If provided, it adds the theoretical isotopic distribution of peptides with this extent of deamidation

ncores

Number of cores used by the Spectra::MsBackendMzR backend in Spectra::peaksData

spectrum_file_name

If mzml_files are provided, whether to use file names as spectra names. Otherwise, it is assumed the the spectra IDs are in the mzML files' headers.

mono_masses

Array with the peptides monoisotopics masses

Details

Provide the input data either using metadata and indir, or provide paths with mzml_files. You can also provide a Spectra object directly in sps_mzr. If data is provided using more than one of the options, the sps_mzr is used, and then the mzml_files.

The default peptides are the ones from Nair et al. (2022). The paper contains the details on the preprocessing procedure.

Value

A list of dataframes, 1 per sample. Each dataframe has 3 columns, m/z, intensity and signal-to-noise ratio for each of the n_isopeaks from each peptide. Missing peaks are NAs.

References

Nair, B. et al. (2022) ‘Parchment Glutamine Index (PQI): A novel method to estimate glutamine deamidation levels in parchment collagen obtained from low-quality MALDI-TOF data’, bioRxiv. doi:10.1101/2022.03.13.483627.


ismaRP/MALDIpqi documentation built on Feb. 14, 2025, 8:28 a.m.