knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width=8, 
  fig.height=8
)

This vignette provides an example workflow for how to use the package MSstatsPTM for a mass spectrometry based proteomics experiment acquired with a labelfree acquisition methods, such as DDA/DIA/SRM/PRM. It also provides examples and an analysis of how adjusting for global protein levels allows for better interpretations of PTM modeling results.

Installation

To install this package, start R (version "4.0") and enter:

``` {r, eval = FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")

BiocManager::install("MSstatsPTM")

```r
library(MSstatsPTM)
library(MSstats)

1. Workflow

1.1 Raw Data Format

Note: We are actively developing dedicated converters for MSstatsPTM. If you have data from a processing tool that does not have a dedicated converter in MSstatsPTM please add a github issue https://github.com/Vitek-Lab/MSstatsPTM/issues and we will add the converter.

The first step is to load in the raw dataset for both the PTM and Protein datasets. Each dataset can formatted using dedicated converters in MSstatsPTM, such as FragePipetoMSstatsPTMFormat. If there is not a dedicated converter for your tool in MSstatsPTM, you can alternatively leverage converters from base MSstats. If using converters from MSstats note they will need to be run both on the global protein and PTM datasets.

Please note for the PTM dataset, both the protein and modification site (or peptide), must be added into the ProteinName column. This allows for the package to summarize to the peptide level, and avoid the off chance there are matching peptides between proteins. For an example of how this can be done please see the code below.

The output of the converter is a list with two formatted data.tables. One each for the PTM and Protein datasets. If a global profiling run was not performed, the Protein data.table will just be NULL

1.1.1 MaxQuant - MaxQtoMSstatsPTMFormat

MSstatsPTM includes a dedicated converter for MaxQuant output. Experiments can be acquired with label-free or TMT labeling methods. No matter the experiment, we recommend using the evidence.txt file, and not a PTM specific file such as the Phospho (STY).txt file. The MaxQtoMSstatsPTMFormat converter includes a variety of parameters. Examples for processing a TMT and LF dataset can be seen below.

``` {r maxq, eval=TRUE}

TMT experiment

head(maxq_tmt_evidence) head(maxq_tmt_annotation)

msstats_format_tmt = MaxQtoMSstatsPTMFormat(evidence=maxq_tmt_evidence, annotation=maxq_tmt_annotation, fasta=system.file("extdata", "maxq_tmt_fasta.fasta", package="MSstatsPTM"), fasta_protein_name="uniprot_ac", mod_id="\(Phospho \(STY\)\)", use_unmod_peptides=TRUE, labeling_type = "TMT", which_proteinid_ptm = "Proteins")

head(msstats_format_tmt$PTM) head(msstats_format_tmt$PROTEIN)

LF experiment

head(maxq_lf_evidence) head(maxq_lf_annotation)

msstats_format_lf = MaxQtoMSstatsPTMFormat(evidence=maxq_lf_evidence, annotation=maxq_lf_annotation, fasta=system.file("extdata", "maxq_lf_fasta.fasta", package="MSstatsPTM"), fasta_protein_name="uniprot_ac", mod_id="\(Phospho \(STY\)\)", use_unmod_peptides=TRUE, labeling_type = "LF", which_proteinid_ptm = "Proteins")

head(msstats_format_lf$PTM) head(msstats_format_lf$PROTEIN)

#### 1.1.2 FragPipe - `FragPipetoMSstatsPTMFormat`

`MSstatsPTM` includes a dedicated converter for FragPipe output. Experiments 
must be acquired with TMT labeling methods (a label-free converter is currently 
in development).The input is the `msstats.csv` and annotation files 
automatically generated by FragPipe. FragPipe provides additional info on the 
localization of modification sites, and the `FragPipetoMSstatsPTMFormat` 
converter includes localization options that are not present in other 
converters.

``` {r fragpipe, eval=TRUE}
head(fragpipe_input)
head(fragpipe_annotation)
head(fragpipe_input_protein)
head(fragpipe_annotation_protein)

msstats_data = FragPipetoMSstatsPTMFormat(fragpipe_input,
                                          fragpipe_annotation,
                                          fragpipe_input_protein, 
                                          fragpipe_annotation_protein,
                                          mod_id_col = "STY",
                                          localization_cutoff=.75,
                                          remove_unlocalized_peptides=TRUE)
head(msstats_data$PTM)
head(msstats_data$PROTEIN)

1.1.3 Proteome Discoverer - PDtoMSstatsPTMFormat

MSstatsPTM includes a dedicated converter for Proteome Discoverer output. Experiments can be acquired with label-free or TMT labeling methods. The input is the psm file and a custom built annotation file. Proteome Discoverer provides additional info on the localization of modification sites, and the PDtoMSstatsPTMFormat converter includes localization options that are not present in other converters.

``` {r pd, eval=TRUE} head(pd_psm_input) head(pd_annotation)

msstats_format = PDtoMSstatsPTMFormat(pd_psm_input, pd_annotation, system.file("extdata", "pd_fasta.fasta", package="MSstatsPTM"), use_unmod_peptides=TRUE, which_proteinid = "Master.Protein.Accessions")

head(msstats_format$PTM) head(msstats_format$PROTEIN)

#### 1.1.4 Spectronaut - `SpectronauttoMSstatsPTMFormat`

`MSstatsPTM` includes a dedicated converter for Spectronaut output. 
Experiments can be acquired with label-free labeling methods.

``` {r Spectronaut, eval=TRUE}
head(spectronaut_input)
head(spectronaut_annotation)

msstats_input = SpectronauttoMSstatsPTMFormat(spectronaut_input, 
                  annotation=spectronaut_annotation, 
                  fasta_path=system.file("extdata", "spectronaut_fasta.fasta", 
                                         package="MSstatsPTM"),
                  use_unmod_peptides=TRUE,
                  mod_id = "\\[Phospho \\(STY\\)\\]",
                  fasta_protein_name = "uniprot_iso"
                  )

head(msstats_input$PTM)
head(msstats_input$PROTEIN)

1.1.5 Skyline - SkylinetoMSstatsPTMFormat

MSstatsPTM includes a dedicated converter for Skyline output. Experiments can be acquired with label-free labeling methods.

1.1.6 Peak Studio - PStoMSstatsPTMFormat

MSstatsPTM includes a dedicated converter for Peak Studio output. Experiments can be acquired with label-free labeling methods.

1.1.6 Progenesis - ProgenesistoMSstatsPTMFormat

MSstatsPTM includes a dedicated converter for Progenesis output. Experiments can be acquired with label-free labeling methods.

1.1.7 Additional tools

If there is not a dedicated MSstatsPTM converter for a processing tool, the existing converters in MSstats and MSstatsTMT converters can be used as described below.

Note, in order to do this, it is critical that the ProteinName column be a combination of the Protein Name and modification site.

As an example, if you would like to analyze an experiment processed with DIANN, you can leverage the DIANNtoMSstatsFormat in MSstats. Given two datasets, named raw_ptm_df and raw_protein_df, and an annotation file, we can process the data as follows.

# Add site into ProteinName column
raw_ptm_df$ProteinName = paste(raw_ptm_df$ProteinName,
                                raw_ptm_df$Site, sep = "_")

# Run MSstats Converters
PTM_data = MSstats::DIANNtoMSstatsFormat(raw_ptm_df, annotation)
PROTEIN_data = MSstats::DIANNtoMSstatsFormat(raw_protein_df, annotation)

# Combine into one list
msstatsptm_input_data = list(PTM = PTM_data, PROTEIN = PROTEIN_data)

The variable msstatsptm_input_data can now be used as the input to the remainder of the MSstatsPTM processing pipeline.

1.2 Summarization - dataSummarizationPTM

After loading in the input data, the next step is to use the dataSummarizationPTM function. This provides the summarized dataset needed to model the protein/PTM abundance. The function will summarize the Protein dataset up to the protein level and will summarize the PTM dataset up to the peptide level. There are multiple options for normalization and missing value imputation. These options should be reviewed in the package documentation.

MSstatsPTM.summary = dataSummarizationPTM(raw.input, verbose = FALSE, 
                                          use_log_file = FALSE, append = FALSE)

head(MSstatsPTM.summary$PTM$ProteinLevelData)
head(MSstatsPTM.summary$PROTEIN$ProteinLevelData)

The summarize function returns a list with PTM and Protein summarization information. Each PTM and Protein include a list of data.tables: FeatureLevelData is a data.table of reformatted input of dataSummarizationPTM, ProteinLevelData is the run level summarization data.

1.2.1 QCPlot

Once summarized, MSstatsPTM provides multiple plots to analyze the experiment. Here we show the quality control boxplot. The first plot shows the modified data and the second plot shows the global protein dataset.

dataProcessPlotsPTM(MSstatsPTM.summary,
                    type = 'QCPLOT',
                    which.PTM = "allonly",
                    address = FALSE)

1.2.2 Profile Plot

Here we show a profile plot. Again the top plot shows the modified peptide, and the bottom shows the overall protein.

dataProcessPlotsPTM(MSstatsPTM.summary,
                    type = 'ProfilePlot',
                    which.Protein = "Q9Y6C9",
                    address = FALSE)

1.3 Modeling - groupComparisonPTM

After summarization, the summarized datasets can be modeled using the groupComparisonPTM function. This function will model the PTM and Protein summarized datasets, and then adjust the PTM model for changes in overall protein abundance. The output of the function is a list containing these three models named: PTM.Model, PROTEIN.Model, ADJUSTED.Model.

# Specify contrast matrix
comparison = matrix(c(-1,0,1,0),nrow=1)
row.names(comparison) = "CCCP-Ctrl"
colnames(comparison) = c("CCCP", "Combo", "Ctrl", "USP30_OE")

MSstatsPTM.model = groupComparisonPTM(MSstatsPTM.summary, 
                                      data.type = "LabelFree",
                                      contrast.matrix = comparison,
                                      use_log_file = FALSE, append = FALSE,
                                      verbose = FALSE)
head(MSstatsPTM.model$PTM.Model)
head(MSstatsPTM.model$PROTEIN.Model)
head(MSstatsPTM.model$ADJUSTED.Model)

1.3.1 Volcano Plot

The models from the groupComparisonPTM function can be used in the model visualization function, groupComparisonPlotsPTM. Here we show Volcano Plots for the models.

``` {r volcano, message=FALSE, warning=FALSE} groupComparisonPlotsPTM(data = MSstatsPTM.model, type = "VolcanoPlot", FCcutoff= 2, logBase.pvalue = 2, address=FALSE)

### 1.3.2 Heatmap Plot

Here we show a Heatmap for the models.

``` {r heatmap, message=FALSE, warning=FALSE}
groupComparisonPlotsPTM(data = MSstatsPTM.model,
                        type = "Heatmap",
                        which.PTM = 1:30,
                        address=FALSE)

1.4 Sample Size Calculation - designSampleSizePTM

Finally, sample size calculation can be performed using the output of the model and the designSampleSizePTM

# Specify contrast matrix
sample_size = designSampleSizePTM(MSstatsPTM.model, c(2.0, 2.75), FDR = 0.05, 
                                  numSample = TRUE, power = 0.8)

head(sample_size)

1.4.1 Sample Size Plot

The output of the sample size function can be plotted using the MSstats designSampleSizePlots function.

MSstats::designSampleSizePlots(sample_size)


Vitek-Lab/MSstatsPTM documentation built on May 1, 2024, 8:25 a.m.