references: - id: silva title: 'Absolute quantification of proteins by LCMSE: A Virtue of parallel MS acquisition' author: - family: Silva given: JC - family: Gorenstein given: MV - family: Li given: GZ - family: Vissers given: JP - family: Geromanos given: SJ container-title: Mol Cell Proteomics volume: 5 URL: 'https://doi.org/10.1074/mcp.M500230-MCP200' DOI: 10.1074/mcp.M500230-MCP200 issue: 1 page: 144-56 type: article-journal issued: year: 2006 month: 1
The ParseMSF package provides several functions for inspecting ThermoFisher MSF files. The most useful of these functions is make_area_table
, which constructs a data frame containing all peptides and their corresponding peak areas. This data frame also includes protein information (protein_desc
) for each peptide.
NOTE: Only ThermoFisher MSF files generated by Proteome Discoverer 1.4.x are supported. Using ParseMSF functions with a file produced by any other version of Proteome Discoverer may produce unexpected results.
library(parsemsf) # Replace `parsemsf_example("test_db.msf")` with the path to a ThermoFisher MSF file area_table <- make_area_table(parsemsf_example("test_db.msf")) knitr::kable(head(area_table))
See the documentation for make_area_table
for a description of each column.
The peak area information stored in one or more ThermoFisher MSF files can be used to estimate protein abundances. The combine_tech_reps
function estimates these abundances across one or more technical replicates. Technical replicates are typically different mass spectrometry injections of the same biological sample. The combine_tech_reps
function will produce more accurate protein abundance estimates if it is provided with multiple technical replicates.
# Replace `parsemsf_example("test_db.msf")` with the path to a ThermoFisher MSF file abundances <- quantitate(c(parsemsf_example("test_db.msf"), parsemsf_example("test_db2.msf"))) knitr::kable(head(abundances))
Abundances are estimated by taking the top three most abundant peptides by area, and averaging them together (area_mean
) [@silva]. If provided multiple technical replicates, quantitate
will, by default, estimate protein abundances by matching peptides across technical replicates. That is, it will only average areas from peptides that are present in both technical replicates. The number unique peptides used to estimate the protein abundances are given by peps_per_rep
.
Protein abundances can also be estimated from a single ThermoFisher MSF File.
# Replace `parsemsf_example("test_db.msf")` with the path to a ThermoFisher MSF file abundances <- quantitate(parsemsf_example("test_db.msf")) knitr::kable(head(abundances))
The ParseMSF package includes a function for inspecting the distribution of peptides within a single protein. The map_peptides
function produces a data frame of peptides with their respective locations within the protein sequence.
peptide_locs <- map_peptides(parsemsf_example("test_db.msf")) # Select columns with start and end locations peptide_locs <- peptide_locs[c("peptide_id", "protein_desc", "peptide_sequence", "start", "end")] knitr::kable(head(peptide_locs))
We can plot these peptide locations with the ggplot2 and dplyr packages.
library(ggplot2) library(dplyr) peptide_summary <- peptide_locs %>% group_by(start, end) %>% summarize(spectral_count = n()) # Count peptides pep_plot <- ggplot(peptide_summary, aes(x = start, xend = end, y = spectral_count, yend = spectral_count)) + geom_segment(size = 1) + ylim(0, 5) + xlab("peptide position within protein") + ylab("peptide count") pep_plot
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.