MSstatsTMTPTM : A package for post translational modification (PTM) significance analysis in shotgun mass spectrometry-based proteomic experiments with tandem mass tag (TMT) labeling

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width=6, 
  fig.height=6
)
library(MSstatsTMTPTM)
library(MSstatsTMT)

This vignette summarizes the functionalities and options of MSstastTMTPTM and provides a workflow example.

MSstatsTMTPTM includes the following two functions for data visualization and statistical testing:

  1. Data visualization of PTM and global protein levels: dataProcessPlotsTMTPMT
  2. Group comparison on PTM/protein quantification data: groupComparisonTMTPTM

Installation

To install this package, start R (version "4.0") and enter:

``` {r, eval = FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")

BiocManager::install("MSstatsTMTPTM")

## 1. dataProcessPlotsTMTPTM()

To illustrate the quantitative data and quality control of MS runs, 
dataProcessPlotsTMT takes the quantitative data from MSstatsTMT converter 
functions as input and generate two types of figures in pdf files as output :
1. Profile plot (specify "ProfilePlot" in option type), to identify the 
potential sources of variation for each protein;
2. Quality control plot (specify "QCPlot" in option type), to evaluate the 
systematic bias between MS runs.

### Arguments

* `data.ptm` name of the data with PTM sites in protein name, which can be the 
output of MSstatsTMT converter functions.
* `data.protein` name of the data with peptide level, which can be the output of
MSstatsTMT converter functions.
* `data.ptm.summarization` name of the data with ptm sites in protein-level name
, which can be the output of the MSstatsTMT \code{\link{proteinSummarization}} 
function.
* `data.protein.summarization` name of the data with protein-level, which can be
the output of the MSstatsTMT \code{\link{proteinSummarization}} function.
* `type` choice of visualization. "ProfilePlot" represents profile plot of log 
intensities across MS runs. "QCPlot" represents box plots of log intensities 
across channels and MS runs.
* `ylimUp` upper limit for y-axis in the log scale. FALSE(Default) for Profile 
Plot and QC Plot uses the upper limit as rounded off maximum of 
log2(intensities) after normalization + 3..
* `ylimDown` lower limit for y-axis in the log scale. FALSE(Default) for Profile
Plot and QC Plot uses 0..
* `x.axis.size` size of x-axis labeling for "Run" and "channel in Profile Plot
and QC Plot.
* `y.axis.size` size of y-axis labels. Default is 10.
* `text.size` size of labels represented each condition at the top of Profile 
plot and QC plot. Default is 4.
* `text.angle` angle of labels represented each condition at the top of Profile 
plot and QC plot. Default is 0.
* `legend.size` size of legend above Profile plot. Default is 7.
* `dot.size.profile` size of dots in Profile plot. Default is 2.
* `ncol.guide` number of columns for legends at the top of plot. Default is 5.
* `width` width of the saved pdf file. Default is 10.
* `height` height of the saved pdf file. Default is 10.
* `which.Protein` Protein list to draw plots. List can be names of Proteins or 
order numbers of Proteins. Default is "all", which generates all plots for each 
protein. For QC plot, "allonly" will generate one QC plot with all proteins.
* `originalPlot` TRUE(default) draws original profile plots, without 
normalization.
* `summaryPlot` TRUE(default) draws profile plots with protein summarization for
each channel and MS run.
* `address` the name of folder that will store the results. Default folder is 
the current working directory. The other assigned folder has to be existed under
the current working directory. An output pdf file is automatically created with 
the default name of "ProfilePlot.pdf" or "QCplot.pdf". The command address can 
help to specify where to store the file as well as how to modify the beginning 
of the file name. If address=FALSE, plot will be not saved as pdf file but 
showed in window.

### Example

The raw dataset for both the PTM and Protein datasets are required for the 
plotting function. This can be the output of the MSstatsTMT converter functions: 
`PDtoMSstatsTMTFormat`, `SpectroMinetoMSstatsTMTFormat`, and 
`OpenMStoMSstatsTMTFormat`. Both the PTM and protein datasets must include the 
following columns: `ProteinName`, `PeptideSequence`, `Charge`, `PSM`, `Mixture`,
`TechRepMixture`, `Run`, `Channel`, `Condition`, `BioReplicate`, and 
`Intensity`.

``` {r}
# read in raw data files
# raw.ptm <- read.csv(file="raw.ptm.csv", header=TRUE)
# raw.protein <- read.csv(file="raw.protein.csv", header=TRUE)
head(raw.ptm)
head(raw.protein)
# Run MSstatsTMT proteinSummarization function
quant.msstats.ptm <- proteinSummarization(raw.ptm,
                                          method = "msstats",
                                          global_norm = TRUE,
                                          reference_norm = FALSE,
                                          MBimpute = TRUE)

quant.msstats.protein <- proteinSummarization(raw.protein,
                                          method = "msstats",
                                          global_norm = TRUE,
                                          reference_norm = FALSE,
                                          MBimpute = TRUE)
head(quant.msstats.ptm)
head(quant.msstats.protein)

# Profile Plot
dataProcessPlotsTMTPTM(data.ptm=raw.ptm,
                    data.protein=raw.protein,
                    data.ptm.summarization=quant.msstats.ptm,
                    data.protein.summarization=quant.msstats.protein,
                    type='ProfilePlot'
                    )

# Quality Control Plot
# dataProcessPlotsTMTPTM(data.ptm=ptm.input.pd,
#                     data.protein=protein.input.pd,
#                     data.ptm.summarization=quant.msstats.ptm,
#                     data.protein.summarization=quant.msstats.protein,
#                     type='QCPlot')

3. groupComparisonTMTPTM()

Tests for significant changes in PTM abundance adjusted for global protein abundance across conditions based on a family of linear mixed-effects models in TMT experiment. Experimental design of case-control study (patients are not repeatedly measured) is automatically determined based on proper statistical model.

Arguments

Example

# test for all the possible pairs of conditions
model.results.pairwise <- groupComparisonTMTPTM(data.ptm=quant.msstats.ptm,
                                       data.protein=quant.msstats.protein)
names(model.results.pairwise)
head(model.results.pairwise[[1]])

# Load specific contrast matrix
#example.contrast.matrix <- read.csv(file="example.contrast.matrix.csv", header=TRUE)
example.contrast.matrix

# test for specified condition comparisons only
model.results.contrast <- groupComparisonTMTPTM(data.ptm=quant.msstats.ptm,
                                       data.protein=quant.msstats.protein,
                                       contrast.matrix = example.contrast.matrix)

names(model.results.contrast)
head(model.results.contrast[[1]])

Session information

sessionInfo()


Try the MSstatsTMTPTM package in your browser

Any scripts or data that you put into this service are public.

MSstatsTMTPTM documentation built on Feb. 18, 2021, 2 a.m.