knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

0. Load MSstatsTMT

Load MSstatsTMT first. Then you are ready to start MSstatsTMT

# ## Install MSstatsTMT package from Bioconductor
# if (!requireNamespace("BiocManager", quietly = TRUE))
#   install.packages("BiocManager")
# 
# BiocManager::install("MSstatsTMT")

library(MSstatsTMT)

This vignette summarizes the introduction and various options of all functionalities in MSstatsTMT.

MSstatsTMT includes the following three steps for statistical testing:

  1. Converters for different peptide quantification tools to get the input with required format: PDtoMSstatsTMTFormat, MaxQtoMSstatsTMTFormat, SpectroMinetoMSstatsTMTFormat, OpenMStoMSstatsTMTFormat and PhilosophertoMSstatsTMTFormat.
  2. Protein summarization based on peptide quantification data: proteinSummarization
  3. Group comparison on protein quantification data: groupComparisonTMT

1. Converters for different peptide quantification tools

MSstatsTMT performs statistical analysis steps, that follow peptide identification and quantitation. Therefore, input to MSstatsTMT is the output of other software tools (such as Proteome Discoverer, MaxQuant and so on) that read raw spectral files , identify and quantify peptide ions. The preferred structure of data for use in MSstatsTMT is a .csv file in a long format with at least 9 columns representing the following variables: ProteinName, PeptideSequence, Charge, PSM, Channel, Condition, BioReplicate, Mixture, Intensity. The variable names are fixed, but are case-insensitive.

head(input.pd)

PDtoMSstatsTMTFormat()

Preprocess PSM data from Proteome Discoverer and convert into the required input format for MSstatsTMT.

Arguments

Example

# read in PD PSM sheet
# raw.pd <- read.delim("161117_SILAC_HeLa_UPS1_TMT10_5Mixtures_3TechRep_UPSdB_Multiconsensus_PD22_Intensity_PSMs.txt")
head(raw.pd)

# Read in annotation including condition and biological replicates per run and channel.
# Users should make this annotation file. It is not the output from Proteome Discoverer.
# annotation.pd <- read.csv(file="PD_Annotation.csv", header=TRUE)
head(annotation.pd)

# use Protein.Accessions as protein name
input.pd <- PDtoMSstatsTMTFormat(raw.pd, annotation.pd, 
                                 which.proteinid = "Protein.Accessions")
head(input.pd)

# use Master.Protein.Accessions as protein name
input.pd.master <- PDtoMSstatsTMTFormat(raw.pd, annotation.pd,
                                 which.proteinid = "Master.Protein.Accessions")
head(input.pd.master)

Here is the summary of pre-processing steps in PDtoMSstatsTMTFormat function.

MaxQtoMSstatsTMTFormat()

Preprocess PSM-level data from MaxQuant and convert into the required input format for MSstatsTMT.

Arguments

Example

# Read in MaxQuant files
# proteinGroups <- read.table("proteinGroups.txt", sep="\t", header=TRUE)

# evidence <- read.table("evidence.txt", sep="\t", header=TRUE)

# Users should make this annotation file. It is not the output from MaxQuant.
# annotation.mq <- read.csv(file="MQ_Annotation.csv", header=TRUE)

input.mq <- MaxQtoMSstatsTMTFormat(evidence, proteinGroups, annotation.mq)
head(input.mq)

SpectroMinetoMSstatsTMTFormat()

Preprocess PSM data from SpectroMine and convert into the required input format for MSstatsTMT.

Arguments

Example

# Read in SpectroMine PSM report
# raw.mine <- read.csv('20180831_095547_CID-OT-MS3-Short_PSM Report_20180831_103118.xls', sep="\t")

# Users should make this annotation file. It is not the output from SpectroMine
# annotation.mine <- read.csv(file="Mine_Annotation.csv", header=TRUE)

input.mine <- SpectroMinetoMSstatsTMTFormat(raw.mine, annotation.mine)
head(input.mine)

OpenMStoMSstatsTMTFormat()

Preprocess MSstatsTMT report from OpenMS and convert into the required input format for MSstatsTMT.

Arguments

Example

# read in MSstatsTMT report from OpenMS
# raw.om <- read.csv("OpenMS_20200222/20200225_MSstatsTMT_OpenMS_Export.csv")
head(raw.om)

# the function only requries one input file
input.om <- OpenMStoMSstatsTMTFormat(raw.om)
head(input.om)

PhilosophertoMSstatsTMTFormat()

Preprocess MSstats report from Philosopher of Fragpipe and convert into the required input format for MSstatsTMT.

Arguments

Example

# Example code is skipped for Philosopher Converter 
# since the input is a path to the folder with all the Philosopher msstats csv files

2. Protein summarization, normalization and visualization

2.1. proteinSummarization()

After reading the input files and get the data with required format, MSstatsTMT performs

Global median normalization is first applied to peptide level quantification data (equalizing the medians across all the channels and MS runs). Protein summarization from peptide level quantification should be performed before testing differentially abundant proteins. Then, normalization between MS runs using reference channels will be implemented. In particular, protein summarization method MSstats assumes missing values are censored and then imputes the missing values before summarizing peptide level data into protein level data. Other methods, including MedianPolish, Median and LogSum, do not impute missing values.

Arguments

Example

# use MSstats for protein summarization
quant.msstats <- proteinSummarization(input.pd,
                                      method="msstats",
                                      global_norm=TRUE,
                                      reference_norm=TRUE,
                                      remove_norm_channel = TRUE,
                                      remove_empty_channel = TRUE)
head(quant.pd.msstats$ProteinLevelData)
# use Median for protein summarization
quant.median <- proteinSummarization(input.pd,
                                     method="Median",
                                     global_norm=TRUE,
                                     reference_norm=TRUE,
                                     remove_norm_channel = TRUE,
                                     remove_empty_channel = TRUE)
head(quant.median$ProteinLevelData)

2.2 dataProcessPlotsTMT()

Visualization for explanatory data analysis. To illustrate the quantitative data after data-preprocessing and quality control of TMT runs, dataProcessPlotsTMT takes the quantitative data and summarized data from function proteinSummarization as input. It generates two types of figures in pdf files as output :

(1) profile plot (specify "ProfilePlot" in option type), to identify the potential sources of variation for each protein;

(2) quality control plot (specify "QCPlot" in option type), to evaluate the systematic bias between MS runs and channels.

Arguments

Example

## Profile plot without norm channnels and empty channels
dataProcessPlotsTMT(data=quant.msstats,
                     type = 'ProfilePlot',
                     width = 21, # adjust the figure width since there are 15 TMT runs.
                     height = 7)

There are two pdfs with all the proteins, first is profile plot and second plot is profile plot with summarized and normalized data. XXX_ProfilePlot.pdf shows each peptide ions across runs and channels, grouped per condition. Each panel represents one MS run and each dot within one panel is one channel within one Run. Each peptide has a different colour/type layout. The dots are linked with line per peptide ion If line is disconnected, that means there is no value (missing value). Profile plot is good visualization to check individual measurements. XXX_ProfilePlot_wSummarization.pdf shows the same peptide ions in grey, with the values as summarized by the model overlayed in red.

Instead of making all profile plots for all proteins, we can make plot for individual protein. Here is the example of proteinP04406

dataProcessPlotsTMT(data=quant.msstats,
                    type='ProfilePlot', # choice of visualization
                    width = 21,
                    height = 7,
                    which.Protein = 'P04406') 
## Quality control plot 
# dataProcessPlotsTMT(data=quant.msstats, 
                     # type='QCPlot',
                     # width = 21, # adjust the figure width since there are 15 TMT runs. 
                     # height = 7)

3. groupComparisonTMT()

Tests for significant changes in protein abundance across conditions based on a family of linear mixed-effects models in TMT experiment. Experimental design of case-control study (patients are not repeatedly measured) is automatically determined based on proper statistical model.

Arguments

If you want to make all the pairwise comparison,MSstatsTMT has an easy option for it. Setting contrast.matrix = pairwise compares all the possible pairs between two conditions.

Example

# test for all the possible pairs of conditions
test.pairwise <- groupComparisonTMT(quant.msstats, moderated = TRUE)
# Show test result
# Label : which comparison is used
# log2FC : estimated log2 fold change between two conditions (the contrast)
# adj.pvalue : adjusted p value
head(test.pairwise$ComparisonResult)

If you would like to compare some specific combination of conditions, you need to tell groupComparisonTMT the contrast of the conditions to compare. You can make your contrast.matrix in R in a text editor. We define our contrast matrix by adding a column for every condition. We add a row for every comparison we would like to make between groups of conditions.

0 is for conditions we would like to ignore. 1 is for conditions we would like to put in the numerator of the ratio or fold-change. -1 is for conditions we would like to put in the denumerator of the ratio or fold-change.

If you have multiple groups, you can assign any group comparisons you are interested in.

# Check the conditions in the protein level data
levels(quant.msstats$ProteinLevelData$Condition)
# Only compare condition 0.125 and 1
comparison<-matrix(c(-1,0,0,1),nrow=1)
# Set the names of each row
row.names(comparison)<-"1-0.125"
# Set the column names
colnames(comparison)<- c("0.125", "0.5", "0.667", "1")
comparison
test.contrast <- groupComparisonTMT(data = quant.msstats, contrast.matrix = comparison, moderated = TRUE)
head(test.contrast$ComparisonResult)


Vitek-Lab/MSstatsTMT documentation built on Oct. 19, 2024, 1:14 a.m.