knitr::opts_chunk$set(message=FALSE, warning=FALSE) # knitr::opts_chunk$set(echo=T, message=F, warning=F, fig.align='center', out.width='10cm')
PlexedPiper
is the main library for the pipeline. MSnID
package is a
flexible way of handling MS/MS identifications. It handles MS/MS data
filtering part.
library(PlexedPiper) library(MSnID) library(tidyverse) library(Biostrings)
This vignette demonstrates processing usign PNNL's DMS as a data source.
PNNL DMS is based on MS SQL Server. Typically data is organized into data
packages according to experiments.
Data package that contains phospho data mirroring
PlexedPiperTestData
is number 3626
. Smaller version of the data
package, suggested for testing purposes is 3625
.
data_package_num <- 3625
# if there is no connection to PNNL DMS, vignette is not compiled if(!is_PNNL_DMS_connection_successful()){ message("There is no connection to PNNL DMS. This code in this vignette won't be evaluated.") knitr::opts_chunk$set(eval = FALSE) }
First step is to determine MS-GF+ jobs using the data package number.
This simply reads parsed to text output of MS-GF+ search engine. The text files
are collated together and the resulting data.frame
used to create MSnID object.
msnid <- read_msms_data_from_DMS(data_package_num) show(msnid)
Phospho dataset involve Ascore jobs for improving phospho-site localization.
There should be one Ascore job per data package. The fetched object is a
data.frame
that links datasets, scans and original PTM localization to
newly suggested locations. Importantly it contains AScore
column that
signifies the confidence of PTM assingment. AScore > 17
is considered
confident.
ascore <- get_AScore_results(data_package_num) msnid <- best_PTM_location_by_ascore(msnid, ascore)
Remove non-phospho. Need to be sure that phospho symbol is *!
msnid <- apply_filter(msnid, "grepl(\"\\\\*\", peptide)")
FDR filter at peptide level.
msnid <- filter_msgf_data_peptide_level(msnid, 0.01)
msnid <- infer_parsimonious_accessions(msnid)
Mapping sites to protein sequence. Call creates number of
columns describing mapping of the site/s onto the protein sequences.
The most important for the user is SiteID
.
Prepare FASTA to make sure entry names in FASTA file match MSnID
accessions.
The plan is to make this conversion automatic.
Note, this type of ID extraction will ignore non-RefSeq entries such as
contaminants.
fst_dms_pth <- path_to_FASTA_used_by_DMS(data_package_num) fst <- readAAStringSet(fst_dms_pth) names(fst) <- sub("^([A-Z]P_\\d+\\.\\d+)\\s.*", "\\1", names(fst))
Mapping main call.
msnid <- map_mod_sites(msnid, fst, accession_col = "accession", peptide_mod_col = "Peptide", mod_char = "*", site_delimiter = "lower") head(psms(msnid))
Fetching and preparing reporter intensities based on MASIC ouput.
masic_data <- read_masic_data_from_DMS(data_package_num, interference_score = T) masic_data <- filter_masic_data(masic_data, 0.5, 0)
study_design_tables <- get_study_design_by_dataset_package(data_package_num) fractions <- study_design_tables$fractions samples <- study_design_tables$samples references <- study_design_tables$references
Aggregation is done directly at SiteID
level.
msnid <- apply_filter(msnid, "!isDecoy") aggregation_level <- c("SiteID") quant_cross_tab <- create_crosstab(msnid, masic_data, aggregation_level, fractions, samples, references) dim(quant_cross_tab) head(quant_cross_tab)
x_ascore <- psms(msnid) %>% group_by(SiteID) %>% summarise(maxAScore = max(maxAScore)) x <- quant_cross_tab %>% as.data.frame() %>% rownames_to_column("SiteID") %>% inner_join(x_ascore) %>% mutate(ConfidentSite = maxAScore > 17) %>% dplyr::select(SiteID, maxAScore, ConfidentSite, everything()) head(x)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.