alignTargetedRuns: Outputs intensities for each analyte from aligned Targeted-MS...

Description Usage Arguments Value Author(s) References See Also Examples

View source: R/align_dia_runs.R

Description

This function expects osw and mzml directories at dataPath. It first reads osw files and fetches chromatogram indices for each analyte. It then align XICs of its reference XICs. Best peak, which has lowest m-score, about the aligned retention time is picked for quantification.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
alignTargetedRuns(
  dataPath,
  outFile = "DIAlignR.csv",
  oswMerged = TRUE,
  runs = NULL,
  runType = "DIA_Proteomics",
  maxFdrQuery = 0.05,
  XICfilter = "sgolay",
  polyOrd = 4,
  kernelLen = 9,
  globalAlignment = "loess",
  globalAlignmentFdr = 0.01,
  globalAlignmentSpan = 0.1,
  RSEdistFactor = 3.5,
  normalization = "mean",
  simMeasure = "dotProductMasked",
  alignType = "hybrid",
  goFactor = 0.125,
  geFactor = 40,
  cosAngleThresh = 0.3,
  OverlapAlignment = TRUE,
  dotProdThresh = 0.96,
  gapQuantile = 0.5,
  hardConstrain = FALSE,
  samples4gradient = 100,
  analyteFDR = 0.01,
  unalignedFDR = 0.01,
  alignedFDR = 0.05,
  baselineType = "base_to_base",
  integrationType = "intensity_sum",
  fitEMG = FALSE,
  recalIntensity = FALSE,
  fillMissing = TRUE,
  smoothPeakArea = FALSE
)

Arguments

dataPath

(string) path to mzml and osw directory.

outFile

(string) name of the output file.

oswMerged

(logical) TRUE for experiment-wide FDR and FALSE for run-specific FDR by pyprophet.

runs

(A vector of string) names of mzml file without extension.

runType

(string) must be one of the strings "DIA_proteomics", "DIA_Metabolomics".

maxFdrQuery

(numeric) a numeric value between 0 and 1. It is used to filter features from osw file which have SCORE_MS2.QVALUE less than itself.

XICfilter

(string) must be either sgolay, boxcar, gaussian, loess or none.

polyOrd

(integer) order of the polynomial to be fit in the kernel.

kernelLen

(integer) number of data-points to consider in the kernel.

globalAlignment

(string) must be from "loess" or "linear".

globalAlignmentFdr

(numeric) a numeric value between 0 and 1. Features should have m-score lower than this value for participation in LOESS fit.

globalAlignmentSpan

(numeric) spanvalue for LOESS fit. For targeted proteomics 0.1 could be used.

RSEdistFactor

(numeric) defines how much distance in the unit of rse remains a noBeef zone.

normalization

(character) must be selected from "mean", "l2".

simMeasure

(string) must be selected from dotProduct, cosineAngle, cosine2Angle, dotProductMasked, euclideanDist, covariance and correlation.

alignType

available alignment methods are "global", "local" and "hybrid".

goFactor

(numeric) penalty for introducing first gap in alignment. This value is multiplied by base gap-penalty.

geFactor

(numeric) penalty for introducing subsequent gaps in alignment. This value is multiplied by base gap-penalty.

cosAngleThresh

(numeric) in simType = dotProductMasked mode, angular similarity should be higher than cosAngleThresh otherwise similarity is forced to zero.

OverlapAlignment

(logical) an input for alignment with free end-gaps. False: Global alignment, True: overlap alignment.

dotProdThresh

(numeric) in simType = dotProductMasked mode, values in similarity matrix higher than dotProdThresh quantile are checked for angular similarity.

gapQuantile

(numeric) must be between 0 and 1. This is used to calculate base gap-penalty from similarity distribution.

hardConstrain

(logical) if FALSE; indices farther from noBeef distance are filled with distance from linear fit line.

samples4gradient

(numeric) modulates penalization of masked indices.

analyteFDR

(numeric) defines the upper limit of FDR on a precursor to be considered for multipeptide.

unalignedFDR

(numeric) must be between 0 and maxFdrQuery. Features below unalignedFDR are considered for quantification even without the RT alignment.

alignedFDR

(numeric) must be between unalignedFDR and 1. Features below alignedFDR are considered for quantification after the alignment.

baselineType

(string) method to estimate the background of a peak contained in XICs. Must be from "base_to_base", "vertical_division_min", "vertical_division_max".

integrationType

(string) method to ompute the area of a peak contained in XICs. Must be from "intensity_sum", "trapezoid", "simpson".

fitEMG

(logical) enable/disable exponentially modified gaussian peak model fitting.

recalIntensity

(logical) recalculate intensity for all analytes.

fillMissing

(logical) calculate intensity for ananlytes for which features are not found.

smoothPeakArea

(logical) FALSE: raw chromatograms will be used for quantification. TRUE: smoothed chromatograms will be used for quantification.

Value

An output table with following columns: precursor, run, intensity, RT, leftWidth, rightWidth, peak_group_rank, m_score, alignment_rank, peptide_id, sequence, charge, group_label.

Author(s)

Shubham Gupta, shubh.gupta@mail.utoronto.ca

ORCID: 0000-0003-3500-8152

License: (c) Author (2019) + GPL-3 Date: 2019-12-14

References

Gupta S, Ahadi S, Zhou W, Röst H. "DIAlignR Provides Precise Retention Time Alignment Across Distant Runs in DIA and Targeted Proteomics." Mol Cell Proteomics. 2019 Apr;18(4):806-817. doi: https://doi.org/10.1074/mcp.TIR118.001132 Epub 2019 Jan 31.

See Also

getRunNames, getFeatures, setAlignmentRank, getMultipeptide

Examples

1
2
dataPath <- system.file("extdata", package = "DIAlignR")
alignTargetedRuns(dataPath, outFile = "testDIAlignR.csv", oswMerged = TRUE)

DIAlignR documentation built on Nov. 8, 2020, 8:22 p.m.