import_dataset_fragpipe_ionquant: Import a label-free proteomics dataset from FragPipe;...

View source: R/parse_fragpipe.R

import_dataset_fragpipe_ionquantR Documentation

Import a label-free proteomics dataset from FragPipe; combines quantitative data from the MSstats.csv file with PSM data from psm.tsv files

Description

This function requires the following FragPipe output data:

  • 'combined_protein.tsv' file, located in the FragPipe output folder

  • 'MSstats.csv' file, located in the FragPipe output folder

  • 'psm.psv' files, located in subdirectories of the FragPipe output folder (these are datamined to obtain peptide PSM confidence and retention times)

Intensity values for each precursor (modified sequence and charge, columns "PeptideSequence" and "PrecursorCharge") in each sample ("Run" column) are extracted from the MSstats.csv file. Next, the psm.tsv files are parsed to obtain retention times and PSM confidence values for each precursor*sample.

Unfortunately, retention times at apex peak aren't readily available for FragPipe Ionquant results across FragPipe versions so in this function we obtain peptide retention times from PSM matches for now. While this yield less accurate RT values and misses RT values for MBR hits (i.e. there is no PSM), the data required for this approach is available for all FragPipe versions since at least 2020.

Finally, the combined_protein.tsv file is used to obtain ambiguous protein IDs per proteingroup (column "Indistinguishable Proteins").

Usage

import_dataset_fragpipe_ionquant(
  path,
  acquisition_mode,
  confidence_threshold = 0.01,
  collapse_peptide_by = "sequence_modified"
)

Arguments

path

the full file path to the FragPipe output directory

acquisition_mode

the type of experiment, should be a string. Valid options: "dda" or "dia"

confidence_threshold

confidence score threshold at which a peptide is considered 'identified', should be a numeric value between 0 and 1 (peptides with a '1 - PeptideProphet.Probability' value in psm.tsv that is <= this threshold are classified as 'detected')

collapse_peptide_by

if multiple data points are available for a peptide in a sample, at what level should these be combined? options: "sequence_modified" (recommended default), "sequence_plain", ""


ftwkoopmans/msdap documentation built on March 5, 2025, 12:15 a.m.