parse_features: Parse Proteome Discoverer output

View source: R/parse_features.R

parse_featuresR Documentation

Parse Proteome Discoverer output

Description

This function parses the output .txt files (peptide groups or PSMs) from Proteome Discoverer and then filters out features based on various criteria.

The function performs the following steps:

  1. Remove features without a master protein

  2. (Optional) Remove features without a unique master protein (i.e. Number.of.Protein.Groups == 1)

  3. (Optional) Remove features matching a cRAP protein

  4. (Optional) Remove features matching any protein associated with a cRAP protein (see below)

  5. Remove features without quantification values (only if TMT or SILAC are TRUE and level = "peptide".)

Usage

parse_features(
  data,
  master_protein_col = "Master.Protein.Accessions",
  protein_col = "Protein.Accessions",
  unique_master = TRUE,
  silac = FALSE,
  TMT = FALSE,
  level = "peptide",
  filter_crap = TRUE,
  crap_proteins = NULL,
  filter_associated_crap = TRUE
)

Arguments

data

data.frame generated from txt file output from Proteome Discoverer.

master_protein_col

string. Name of column containing master proteins.

protein_col

string. Name of column containing all protein matches.

unique_master

logical. Filter out features without a unique master protein.

silac

logical. Is the experiment a SILAC experiment?

TMT

logical. Is the experiment a TMT experiment?

level

string. Type of input file, must be one of either "peptide" or "PSM".

filter_crap

logical. Filter out features which match a cRAP protein.

crap_proteins

character vector. Contains the cRAP accessions, for example: c("P02768") which is serum albumin.

filter_associated_crap

logical. Filter out features which match a cRAP associated protein.

Details

Associated cRAP proteins are proteins which have at least one feature shared with a cRAP protein. It has been observed that the cRAP database does not contain all possible cRAP proteins e.g. some features can be assigned to a keratin which is not in the provided cRAP database.

Using filter_associated_crap = TRUE will filter out f2 and f3 in addition to f1, in the example below; regardless of the value in the Master.Protein.Accession column.

feature  Protein.Accessions         Master.Protein.Accessions
f1       protein1, protein2, cRAP,  protein1,
f2       protein1, protein3         protein3,
f3       protein2                   protein2

Value

Returns a data.frame with the filtered Proteome Discoverer output.

Examples

## Not run: 

#### PSMs.txt example ####
# load PD PSMs.txt output
psm <- read.delim("data-raw/PSMs.txt")

# load the cRAP FASTA used for the PD search
crap_fasta <- Biostrings::fasta.index(
  "2021-06_CCP_cRAP.fasta", seqtype = "AA"
)

# extract the UniProt accessions from the cRAP FASTA headers
crap_accessions <- regmatches(
  crap_fasta$desc,
  gregexpr("(?<=\\|).*?(?=\\|)", crap_fasta$desc, perl = TRUE)
) %>%
  unlist()

# parse peptides from an e.g. TMT experiment
psm2 <- parse_features(
  data = psm,
  master_protein_col = "Master.Protein.Accessions",
  protein_col = "Protein.Accessions",
  unique_master = TRUE,
  TMT = TRUE,
  level = "PSM",
  filter_crap = TRUE,
  crap_proteins = crap_accessions,
  filter_associated_crap = TRUE
)

#### peptideGroups.txt example ####
# load PD peptideGroups.txt output
pep_group <- read.delim("data-raw/peptideGroups.txt")

# load the cRAP FASTA used for the PD search
crap_fasta <- Biostrings::fasta.index(
  "2021-06_CCP_cRAP.fasta", seqtype = "AA"
)

# extract the UniProt accessions from the cRAP FASTA headers
crap_accessions <- regmatches(

# parse peptides from an e.g. SILAC experiment
pep_group2 <- parse_features(
  data = pep_group,
  master_protein_col = "Master.Protein.Accessions",
  protein_col = "Protein.Accessions",
  unique_master = TRUE,
  silac = TRUE,
  level = "peptide",
  filter_crap = TRUE,
  crap_proteins = crap_accessions,
  filter_associated_crap = TRUE
)


## End(Not run)

CambridgeCentreForProteomics/camprotR documentation built on Jan. 27, 2023, 8:36 p.m.