View source: R/parse_features.R
parse_features | R Documentation |
This function parses the output .txt files (peptide groups or PSMs) from Proteome Discoverer and then filters out features based on various criteria.
The function performs the following steps:
Remove features without a master protein
(Optional) Remove features without a unique master protein (i.e. Number.of.Protein.Groups == 1)
(Optional) Remove features matching a cRAP protein
(Optional) Remove features matching any protein associated with a cRAP protein (see below)
Remove features without quantification values (only if TMT or SILAC
are TRUE
and level = "peptide"
.)
parse_features( data, master_protein_col = "Master.Protein.Accessions", protein_col = "Protein.Accessions", unique_master = TRUE, silac = FALSE, TMT = FALSE, level = "peptide", filter_crap = TRUE, crap_proteins = NULL, filter_associated_crap = TRUE )
data |
|
master_protein_col |
|
protein_col |
|
unique_master |
|
silac |
|
TMT |
|
level |
|
filter_crap |
|
crap_proteins |
|
filter_associated_crap |
|
Associated cRAP proteins are proteins which have at least one feature shared with a cRAP protein. It has been observed that the cRAP database does not contain all possible cRAP proteins e.g. some features can be assigned to a keratin which is not in the provided cRAP database.
Using filter_associated_crap = TRUE
will filter out f2 and f3 in
addition to f1, in the example below; regardless of the value in the
Master.Protein.Accession column.
feature Protein.Accessions Master.Protein.Accessions f1 protein1, protein2, cRAP, protein1, f2 protein1, protein3 protein3, f3 protein2 protein2
Returns a data.frame
with the filtered Proteome Discoverer output.
## Not run: #### PSMs.txt example #### # load PD PSMs.txt output psm <- read.delim("data-raw/PSMs.txt") # load the cRAP FASTA used for the PD search crap_fasta <- Biostrings::fasta.index( "2021-06_CCP_cRAP.fasta", seqtype = "AA" ) # extract the UniProt accessions from the cRAP FASTA headers crap_accessions <- regmatches( crap_fasta$desc, gregexpr("(?<=\\|).*?(?=\\|)", crap_fasta$desc, perl = TRUE) ) %>% unlist() # parse peptides from an e.g. TMT experiment psm2 <- parse_features( data = psm, master_protein_col = "Master.Protein.Accessions", protein_col = "Protein.Accessions", unique_master = TRUE, TMT = TRUE, level = "PSM", filter_crap = TRUE, crap_proteins = crap_accessions, filter_associated_crap = TRUE ) #### peptideGroups.txt example #### # load PD peptideGroups.txt output pep_group <- read.delim("data-raw/peptideGroups.txt") # load the cRAP FASTA used for the PD search crap_fasta <- Biostrings::fasta.index( "2021-06_CCP_cRAP.fasta", seqtype = "AA" ) # extract the UniProt accessions from the cRAP FASTA headers crap_accessions <- regmatches( # parse peptides from an e.g. SILAC experiment pep_group2 <- parse_features( data = pep_group, master_protein_col = "Master.Protein.Accessions", protein_col = "Protein.Accessions", unique_master = TRUE, silac = TRUE, level = "peptide", filter_crap = TRUE, crap_proteins = crap_accessions, filter_associated_crap = TRUE ) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.