parse_features: Parse Proteome Discoverer output
In CambridgeCentreForProteomics/camprotR: Processing, analysing and visualising CCP proteomics data

parse_features

R Documentation

Parse Proteome Discoverer output

Description

This function parses the output .txt files (peptide groups or PSMs) from Proteome Discoverer and then filters out features based on various criteria.

The function performs the following steps:

Remove features without a master protein
(Optional) Remove features without a unique master protein (i.e. Number.of.Protein.Groups == 1)
(Optional) Remove features matching a cRAP protein
(Optional) Remove features matching any protein associated with a cRAP protein (see below)
Remove features without quantification values (only if TMT or SILAC are TRUE and level = "peptide".)

Usage

parse_features(
  data,
  master_protein_col = "Master.Protein.Accessions",
  protein_col = "Protein.Accessions",
  unique_master = TRUE,
  silac = FALSE,
  TMT = FALSE,
  level = "peptide",
  filter_crap = TRUE,
  crap_proteins = NULL,
  filter_associated_crap = TRUE
)

Arguments

`data`	`data.frame` generated from txt file output from Proteome Discoverer.
`master_protein_col`	`string`. Name of column containing master proteins.
`protein_col`	`string`. Name of column containing all protein matches.
`unique_master`	`logical`. Filter out features without a unique master protein.
`silac`	`logical`. Is the experiment a SILAC experiment?
`TMT`	`logical`. Is the experiment a TMT experiment?
`level`	`string`. Type of input file, must be one of either `"peptide"` or `"PSM"`.
`filter_crap`	`logical`. Filter out features which match a cRAP protein.
`crap_proteins`	`⁠character vector⁠`. Contains the cRAP accessions, for example: `c("P02768")` which is serum albumin.
`filter_associated_crap`	`logical`. Filter out features which match a cRAP associated protein.

Details

Associated cRAP proteins are proteins which have at least one feature shared with a cRAP protein. It has been observed that the cRAP database does not contain all possible cRAP proteins e.g. some features can be assigned to a keratin which is not in the provided cRAP database.

Using filter_associated_crap = TRUE will filter out f2 and f3 in addition to f1, in the example below; regardless of the value in the Master.Protein.Accession column.

feature  Protein.Accessions         Master.Protein.Accessions
f1       protein1, protein2, cRAP,  protein1,
f2       protein1, protein3         protein3,
f3       protein2                   protein2

Value

Returns a data.frame with the filtered Proteome Discoverer output.

Examples

## Not run: 

#### PSMs.txt example ####
# load PD PSMs.txt output
psm <- read.delim("data-raw/PSMs.txt")

# load the cRAP FASTA used for the PD search
crap_fasta <- Biostrings::fasta.index(
  "2021-06_CCP_cRAP.fasta", seqtype = "AA"
)

# extract the UniProt accessions from the cRAP FASTA headers
crap_accessions <- regmatches(
  crap_fasta$desc,
  gregexpr("(?<=\\|).*?(?=\\|)", crap_fasta$desc, perl = TRUE)
) %>%
  unlist()

# parse peptides from an e.g. TMT experiment
psm2 <- parse_features(
  data = psm,
  master_protein_col = "Master.Protein.Accessions",
  protein_col = "Protein.Accessions",
  unique_master = TRUE,
  TMT = TRUE,
  level = "PSM",
  filter_crap = TRUE,
  crap_proteins = crap_accessions,
  filter_associated_crap = TRUE
)

#### peptideGroups.txt example ####
# load PD peptideGroups.txt output
pep_group <- read.delim("data-raw/peptideGroups.txt")

# load the cRAP FASTA used for the PD search
crap_fasta <- Biostrings::fasta.index(
  "2021-06_CCP_cRAP.fasta", seqtype = "AA"
)

# extract the UniProt accessions from the cRAP FASTA headers
crap_accessions <- regmatches(

# parse peptides from an e.g. SILAC experiment
pep_group2 <- parse_features(
  data = pep_group,
  master_protein_col = "Master.Protein.Accessions",
  protein_col = "Protein.Accessions",
  unique_master = TRUE,
  silac = TRUE,
  level = "peptide",
  filter_crap = TRUE,
  crap_proteins = crap_accessions,
  filter_associated_crap = TRUE
)


## End(Not run)

CambridgeCentreForProteomics/camprotR documentation built on July 7, 2024, 2:13 a.m.

CambridgeCentreForProteomics/camprotR index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

CambridgeCentreForProteomics/camprotR
Processing, analysing and visualising CCP proteomics data

parse_features: Parse Proteome Discoverer output
In CambridgeCentreForProteomics/camprotR: Processing, analysing and visualising CCP proteomics data

Parse Proteome Discoverer output

Description

Usage

Arguments

Details

Value

Examples

Related to parse_features in CambridgeCentreForProteomics/camprotR...

R Package Documentation

Browse R Packages

We want your feedback!

CambridgeCentreForProteomics/camprotR Processing, analysing and visualising CCP proteomics data

parse_features: Parse Proteome Discoverer output In CambridgeCentreForProteomics/camprotR: Processing, analysing and visualising CCP proteomics data

Parse Proteome Discoverer output

Description

Usage

Arguments

Details

Value

Examples

Related to parse_features in CambridgeCentreForProteomics/camprotR...

R Package Documentation

Browse R Packages

We want your feedback!

CambridgeCentreForProteomics/camprotR
Processing, analysing and visualising CCP proteomics data

parse_features: Parse Proteome Discoverer output
In CambridgeCentreForProteomics/camprotR: Processing, analysing and visualising CCP proteomics data