View source: R/parse_features.R
parse_features | R Documentation |
This function parses the output .txt files (peptide groups or PSMs) from Proteome Discoverer and then filters out features based on various criteria.
The function performs the following steps:
Remove features without a master protein
(Optional) Remove features without a unique master protein (i.e. Number.of.Protein.Groups == 1)
(Optional) Remove features matching a cRAP protein
(Optional) Remove features matching any protein associated with a cRAP protein (see below)
Remove features without quantification values (only if TMT or SILAC
are TRUE
and level = "peptide"
.)
parse_features(
data,
master_protein_col = "Master.Protein.Accessions",
protein_col = "Protein.Accessions",
unique_master = TRUE,
silac = FALSE,
TMT = FALSE,
level = "peptide",
filter_crap = TRUE,
crap_proteins = NULL,
filter_associated_crap = TRUE
)
data |
|
master_protein_col |
|
protein_col |
|
unique_master |
|
silac |
|
TMT |
|
level |
|
filter_crap |
|
crap_proteins |
|
filter_associated_crap |
|
Associated cRAP proteins are proteins which have at least one feature shared with a cRAP protein. It has been observed that the cRAP database does not contain all possible cRAP proteins e.g. some features can be assigned to a keratin which is not in the provided cRAP database.
Using filter_associated_crap = TRUE
will filter out f2 and f3 in
addition to f1, in the example below; regardless of the value in the
Master.Protein.Accession column.
feature Protein.Accessions Master.Protein.Accessions f1 protein1, protein2, cRAP, protein1, f2 protein1, protein3 protein3, f3 protein2 protein2
Returns a data.frame
with the filtered Proteome Discoverer output.
## Not run:
#### PSMs.txt example ####
# load PD PSMs.txt output
psm <- read.delim("data-raw/PSMs.txt")
# load the cRAP FASTA used for the PD search
crap_fasta <- Biostrings::fasta.index(
"2021-06_CCP_cRAP.fasta", seqtype = "AA"
)
# extract the UniProt accessions from the cRAP FASTA headers
crap_accessions <- regmatches(
crap_fasta$desc,
gregexpr("(?<=\\|).*?(?=\\|)", crap_fasta$desc, perl = TRUE)
) %>%
unlist()
# parse peptides from an e.g. TMT experiment
psm2 <- parse_features(
data = psm,
master_protein_col = "Master.Protein.Accessions",
protein_col = "Protein.Accessions",
unique_master = TRUE,
TMT = TRUE,
level = "PSM",
filter_crap = TRUE,
crap_proteins = crap_accessions,
filter_associated_crap = TRUE
)
#### peptideGroups.txt example ####
# load PD peptideGroups.txt output
pep_group <- read.delim("data-raw/peptideGroups.txt")
# load the cRAP FASTA used for the PD search
crap_fasta <- Biostrings::fasta.index(
"2021-06_CCP_cRAP.fasta", seqtype = "AA"
)
# extract the UniProt accessions from the cRAP FASTA headers
crap_accessions <- regmatches(
# parse peptides from an e.g. SILAC experiment
pep_group2 <- parse_features(
data = pep_group,
master_protein_col = "Master.Protein.Accessions",
protein_col = "Protein.Accessions",
unique_master = TRUE,
silac = TRUE,
level = "peptide",
filter_crap = TRUE,
crap_proteins = crap_accessions,
filter_associated_crap = TRUE
)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.