MS Proteomics Data

Description Usage Arguments Details Value Author(s) References See Also Examples

import of mass spectrometry proteomics data analysis software reports.

## Default S3 method:
import(ms_filenames, ms_filetype, concentration_filename=NA,
averageruns=FALSE, sumruns=FALSE, mprophet_cutoff=0.01,
openswath_superimpose_identifications=FALSE, openswath_replace_run_id=FALSE,
openswath_filtertop=FALSE, openswath_removedecoys=TRUE,
peptideprophet_cutoff=0.95, abacus_column="ADJNSAF", pepxml2csv_runsplit="~",
...)

`ms_filenames`	the paths and filenames of files to import in a character or array class.
`ms_filetype`	one of `"openswath"`, `"mprophet"`, `"openmslfq"`, `"skyline"`, `"abacus"` or `"pepxml2csv"` filetypes. Multiple files of the same type can be supplied in a vector.
`concentration_filename`	the filename of a csv with concentrations (in any unit). Needs to have the columns `"peptide_id"` (or `"protein_id"`) and `"concentration"`.
`averageruns`	whether different MS runs should be averaged.
`sumruns`	whether different MS runs should be summed.
`mprophet_cutoff`	(openswath and mprophet only:) the FDR cutoff (m_score) for OpenSWATH and mProphet reports.
`openswath_superimpose_identifications`	(openswath only:) enables propagation of identification among several runs if feature alignment or requantification was conducted.
`openswath_replace_run_id`	whether the run_id of the MS data should be replaced by the filename.
`openswath_filtertop`	(openswath only:) whether only the top peakgroup should be considered.
`openswath_removedecoys`	(openswath only:) whether decoys should be removed.
`peptideprophet_cutoff`	(abacus and pepxml2csv only:) the PeptideProphet probability cutoff for Abacus reports.
`abacus_column`	(abacus only:) target score: one of `"NUMSPECSTOT"`,`"TOTNSAF"`,`"NUMSPECSUNIQ"`, `"UNIQNSAF"`,`"NUMSPECSADJ"` or `"ADJNSAF"`.
`pepxml2csv_runsplit`	(pepxml2csv only:) the separator of the run_id and spectrum_id column.
`...`	future extensions.

The import function provides unified access to the results of various standard proteomic quantification applications like OpenSWATH (Roest et al., 2014), mProphet (Reiter et al., 2011), OpenMS (Sturm et al., 2008; Weisser et al., 2013),Skyline (MacLean et al., 2010) and Abacus (Fermin et al., 2011). This enables generic application of all further steps using the same data structure and enables extension to support other data formats. If multiple runs, i.e. replicates, are supplied, the averaged or summed values can be used to summarize the experimental data. In addition to the input from the analysis software, an input table with the anchor peptides or proteins and sample specific absolute abundance, or an estimate of the total protein concentration in the sample is required. The endpoint of this step is a unified input data structure.

A standard aLFQ import data frame, either on transition, peptide (precursor) or protein level.

George Rosenberger gr2578@cumc.columbia.edu

Roest H. L. et al. A tool for the automated, targeted analysis of data-independent acquisition (DIA) MS-data: OpenSWATH. Nat Biotech, in press.

Reiter, L. et al. mProphet: automated data processing and statistical validation for large-scale SRM experiments. Nat Meth 8, 430-435 (2011).

Sturm, M. et al. OpenMS - An open-source software framework for mass spectrometry. BMC Bioinformatics 9, 163 (2008).

Weisser, H. et al. An automated pipeline for high-throughput label-free quantitative proteomics. J. Proteome Res. 130208071745007 (2013). doi:10.1021/pr300992u

MacLean, B. et al. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 26, 966-968 (2010).

Fermin, D., Basrur, V., Yocum, A. K. & Nesvizhskii, A. I. Abacus: A computational tool for extracting and pre-processing spectral count data for label-free quantitative proteomic analysis. PROTEOMICS 11, 1340-1345 (2011).

ProteinInference, AbsoluteQuantification, ALF, APEX, apexFeatures, proteotypic

import(ms_filenames = system.file("extdata","example_openswath.txt",package="aLFQ"),
ms_filetype = "openswath", concentration_filename=NA,
averageruns=FALSE, sumruns=FALSE, mprophet_cutoff=0.01,
openswath_superimpose_identifications=FALSE, openswath_replace_run_id=FALSE,
openswath_filtertop=FALSE, openswath_removedecoys=TRUE)

import(ms_filenames = system.file("extdata","example_mprophet.txt",package="aLFQ"),
ms_filetype = "mprophet",
concentration_filename = system.file("extdata","example_concentration_peptide.csv",
package="aLFQ"), averageruns=FALSE, sumruns=FALSE, mprophet_cutoff=0.01)

import(ms_filenames = system.file("extdata","example_openmslfq.csv",package="aLFQ"),
ms_filetype = "openmslfq", concentration_filename=NA, averageruns=FALSE, sumruns=FALSE)

import(ms_filenames = system.file("extdata","example_skyline.csv",package="aLFQ"),
ms_filetype = "skyline",
concentration_filename =
system.file("extdata","example_concentration_protein.csv",package="aLFQ"),
averageruns=FALSE, sumruns=FALSE)

import(ms_filenames = system.file("extdata","example_abacus_protein.txt",package="aLFQ"),
ms_filetype = "abacus", concentration_filename =
system.file("extdata","example_concentration_protein.csv",
package="aLFQ"), averageruns=FALSE, sumruns=FALSE,
peptideprophet_cutoff=0.95, abacus_column="ADJNSAF")