getAlignObjs: AlignObj for analytes between a pair of runs
In DIAlignR: Dynamic Programming Based Alignment of MS2 Chromatograms

Description Usage Arguments Value Author(s) References See Also Examples

This function expects osw and mzml directories at dataPath. It first reads osw files and fetches chromatogram indices for each requested analyte. It then align XICs of each analyte to its reference XICs. AlignObj is returned which contains aligned indices and cumulative score along the alignment path.

getAlignObjs(
  analytes,
  runs,
  dataPath = ".",
  refRun = NULL,
  oswMerged = TRUE,
  runType = "DIA_Proteomics",
  maxFdrQuery = 0.05,
  analyteFDR = 0.01,
  XICfilter = "sgolay",
  polyOrd = 4,
  kernelLen = 9,
  globalAlignment = "loess",
  globalAlignmentFdr = 0.01,
  globalAlignmentSpan = 0.1,
  RSEdistFactor = 3.5,
  normalization = "mean",
  simMeasure = "dotProductMasked",
  alignType = "hybrid",
  goFactor = 0.125,
  geFactor = 40,
  cosAngleThresh = 0.3,
  OverlapAlignment = TRUE,
  dotProdThresh = 0.96,
  gapQuantile = 0.5,
  hardConstrain = FALSE,
  samples4gradient = 100,
  objType = "light"
)

`analytes`	(vector of integers) transition_group_ids for which features are to be extracted.
`runs`	(A vector of string) Names of mzml file without extension.
`dataPath`	(char) Path to mzml and osw directory.
`refRun`	(string) reference for alignment. If no run is provided, m-score is used to select reference run.
`oswMerged`	(logical) TRUE for experiment-wide FDR and FALSE for run-specific FDR by pyprophet.
`runType`	(char) This must be one of the strings "DIA_proteomics", "DIA_Metabolomics".
`maxFdrQuery`	(numeric) A numeric value between 0 and 1. It is used to filter features from osw file which have SCORE_MS2.QVALUE less than itself.
`analyteFDR`	(numeric) only analytes that have m-score less than this, will be included in the output.
`XICfilter`	(string) must be either sgolay, boxcar, gaussian, loess or none.
`polyOrd`	(integer) order of the polynomial to be fit in the kernel.
`kernelLen`	(integer) number of data-points to consider in the kernel.
`globalAlignment`	(string) must be from "loess" or "linear".
`globalAlignmentFdr`	(numeric) a numeric value between 0 and 1. Features should have m-score lower than this value for participation in LOESS fit.
`globalAlignmentSpan`	(numeric) spanvalue for LOESS fit. For targeted proteomics 0.1 could be used.
`RSEdistFactor`	(numeric) defines how much distance in the unit of rse remains a noBeef zone.
`normalization`	(character) must be selected from "mean", "l2".
`simMeasure`	(string) must be selected from dotProduct, cosineAngle, cosine2Angle, dotProductMasked, euclideanDist, covariance and correlation.
`alignType`	available alignment methods are "global", "local" and "hybrid".
`goFactor`	(numeric) penalty for introducing first gap in alignment. This value is multiplied by base gap-penalty.
`geFactor`	(numeric) penalty for introducing subsequent gaps in alignment. This value is multiplied by base gap-penalty.
`cosAngleThresh`	(numeric) in simType = dotProductMasked mode, angular similarity should be higher than cosAngleThresh otherwise similarity is forced to zero.
`OverlapAlignment`	(logical) an input for alignment with free end-gaps. False: Global alignment, True: overlap alignment.
`dotProdThresh`	(numeric) in simType = dotProductMasked mode, values in similarity matrix higher than dotProdThresh quantile are checked for angular similarity.
`gapQuantile`	(numeric) must be between 0 and 1. This is used to calculate base gap-penalty from similarity distribution.
`hardConstrain`	(logical) if FALSE; indices farther from noBeef distance are filled with distance from linear fit line.
`samples4gradient`	(numeric) modulates penalization of masked indices.
`objType`	(char) Must be selected from light, medium and heavy.

A list of fileInfo and AlignObjs. Each AlignObj is an S4 object. Three most-important slots are:

`indexA_aligned`	(integer) aligned indices of reference run.
`indexB_aligned`	(integer) aligned indices of experiment run.
`score`	(numeric) cumulative score of alignment.

Shubham Gupta, shubh.gupta@mail.utoronto.ca

ORCID: 0000-0003-3500-8152

License: (c) Author (2019) + GPL-3 Date: 2019-12-14

Gupta S, Ahadi S, Zhou W, Röst H. "DIAlignR Provides Precise Retention Time Alignment Across Distant Runs in DIA and Targeted Proteomics." Mol Cell Proteomics. 2019 Apr;18(4):806-817. doi: https://doi.org/10.1074/mcp.TIR118.001132 Epub 2019 Jan 31.

plotAlignedAnalytes, getRunNames, getFeatures, getXICs4AlignObj, getAlignObj

dataPath <- system.file("extdata", package = "DIAlignR")
runs <- c("hroest_K120808_Strep10%PlasmaBiolRepl1_R03_SW_filt",
 "hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt",
 "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt")
analytes <- c(32L, 898L, 2474L)
AlignObjOutput <- getAlignObjs(analytes, runs, dataPath = dataPath)
plotAlignedAnalytes(AlignObjOutput)