Description Usage Arguments Value Examples
Takes a ThermoFisher MSF file and finds the location of each peptide within its corresponding protein sequence. In cases where a single peptide maps to multiple locations within a protein sequence, only the first location is reported. If a peptide maps ambiguously to multiple proteins, all locations are reported with data from each peptide-protein combination on a separate row.
1 | map_peptides(msf_file, min_conf = "High", prot_regex = "")
|
msf_file |
A file path to a ThermoFisher MSF file. |
min_conf |
"High", "Medium", or "Low". The minimum peptide confidence level to retrieve from MSF file. |
prot_regex |
Regular expression where the first group matches a protein name or ID from the protein description. Regex must contain ONE group. The protein description is typically generated from a fasta reference file that was used for the database search. |
A dataframe containing start and stop positions (relative to the parent protein sequence) for each peptide in the database.
peptide_id |
a unique peptide ID |
spectrum_id |
a unique spectrum ID |
protein_id |
unique protein group ID to which this peptide maps |
protein_desc |
protein description from reference database used to assign peptides to protein groups, parsed according to |
peptide_sequence |
amino acid sequence (does not show post-translational modifications) |
pep_score |
PEP score |
q_value |
Q-value score |
protein_sequence |
parent protein sequence |
start |
start position of peptide within protein sequence |
end |
end position of peptide within protein sequence |
1 | map_peptides(parsemsf_example("test_db.msf"))
|
Source: local data frame [28 x 10]
Groups: <by row>
# A tibble: 28 x 10
peptide_id spectrum_id protein_id protein_desc peptide_sequence pep_score
<int> <int> <int> <chr> <chr> <dbl>
1 27146 15646 807657 NP_041997.1 AALTDQVALGK 0.000533
2 27177 15663 807657 NP_041997.1 AALTDQVALGK 0.000515
3 35484 20122 807657 NP_041997.1 ANFQADQIIAK 0.0116
4 35511 20136 807657 NP_041997.1 ANFQADQIIAK 0.000491
5 37869 21360 807657 NP_041997.1 TQAAYLAPGENLDDK 0.000128
6 37913 21384 807657 NP_041997.1 TQAAYLAPGENLDDK 0.000468
7 38957 21935 807657 NP_041997.1 SAQFPVLGR 0.00419
8 40200 22580 807657 NP_041997.1 SAQFPVLGR 0.00115
9 50946 28239 807657 NP_041997.1 LALFLK 0.00199
10 50972 28253 807657 NP_041997.1 LALFLK 0.00474
# ... with 18 more rows, and 4 more variables: q_value <dbl>,
# protein_sequence <chr>, start <int>, end <int>
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.