BiocStyle::markdown()
knitr::opts_chunk$set(fig.width=10, fig.height=7, warning=FALSE, message=FALSE)
options(width=110)

```{=html}

# __MSstats: Protein/Peptide significance analysis__

Package: MSstats

Author: Anshuman Raina & Devon Kohler

Date: 5th Semptember 2024

## __Introduction__

`MSstats`, an R package in Bioconductor, supports protein differential analysis 
for statistical relative quantification of proteins and peptides in global, 
targeted and data-independent proteomics. It handles shotgun, label-free and 
label-based (universal synthetic peptide-based) SRM (selected reaction 
monitoring), and DIA (data independent acquisition) experiments. It can be used 
for experiments with complex designs (e.g. comparing more than two experimental 
conditions, or a repeated measure design, such as a time course).

This vignette summarizes the introduction and various options of all 
functionalities in `MSstats`. More details are available in `User Manual`.

For more information about the MSstats workflow, including a detailed 
description of the available processing options and their impact on the 
resulting differential analysis, please see the following publication:

Kohler et al, Nature Protocols 19, 2915–2938 (2024).

## __Installation__

To install this package, start R (version “4.0”) and enter:

``` {r code Installation}
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("MSstats")
library(MSstats)
library(ggplot2)

1. Workflow

1.1 Raw Data

To begin with, we will load sample datasets, including both annotated and plain data. The dataset you need can be found here.

We will also load the Annotation Dataset using MSstatsConvert. You can access this dataset here.

``` {r code Load Dataset} library(MSstats)

Load data

pd_raw = system.file("tinytest/raw_data/PD/pd_input.csv", package = "MSstatsConvert")

annotation_raw = system.file("tinytest/raw_data/PD/annot_pd.csv", package = "MSstatsConvert")

pd = data.table::fread(pd_raw) annotation = data.table::fread(annotation_raw)

head(pd, 5) head(annotation, 5)

### __1.2 Loading PD Data to MSstats__

The imported data from Step 1.1. now must be converted through `MSstatsConvert` 
package's `PDtoMSstatsFormat` converter.

This function converts the Proteome Discoverer output into the required input 
format for `MSstats`.

Actual data modification can be seen below:

```r
library(MSstatsConvert)

pd_imported = MSstatsConvert::PDtoMSstatsFormat(pd, annotation, 
                                                use_log_file = FALSE)

head(pd_imported)

1.3 Converters

We have the following converters, which allow you to convert various types of output reports which include the feature level data to the required input format of MSstats. Further information about the converters can be found in the MSstatsConvert package.

  1. DIANNtoMSstatsFormat
  2. DIAUmpiretoMSstatsFormat
  3. FragPipetoMSstatsFormat
  4. MaxQtoMSstatsFormat
  5. OpenMStoMSstatsFormat
  6. OpenSWATHtoMSstatsFormat
  7. PDtoMSstatsFormat
  8. ProgenesistoMSstatsFormat
  9. SkylinetoMSstatsFormat
  10. SpectronauttoMSstatsFormat
  11. MetamorpheusToMSstatsFormat

We show an example of how to use the above said Converters. For more information about using the individual converters please see the coresponding documentation.

skyline_raw = system.file("tinytest/raw_data/Skyline/skyline_input.csv", 
                    package = "MSstatsConvert")

skyline = data.table::fread(skyline_raw)
head(skyline, 5)
msstats_format = MSstatsConvert::SkylinetoMSstatsFormat(skyline_raw,
                                      qvalue_cutoff = 0.01,
                                      useUniquePeptide = TRUE,
                                      removeFewMeasurements = TRUE,
                                      removeOxidationMpeptides = TRUE,
                                      removeProtein_with1Feature = TRUE)
head(msstats_format)

1.4 Data Process

Once we import the dataset correctly with Converter, we need to pre-process the data which is done by the dataProcess function. This step involves data processing and quality control of the measured feature intensities.

This function includes 5 main processing steps (with other additional small steps):

``` {r code dataProcess} summarized = dataProcess( pd_imported, logTrans = 2, normalization = "equalizeMedians", featureSubset = "all", n_top_feature = 3, summaryMethod = "TMP", equalFeatureVar = TRUE, censoredInt = "NA", MBimpute = TRUE )

head(summarized$FeatureLevelData)

head(summarized$ProteinLevelData)

head(summarized$SummaryMethod)

### __1.4.1 Data Process Plots__

After processing the input data, `MSstats` provides multiple plots to analyze the 
results. Here we show the various types of plots we can use. By default, a 
pdf file will be downloaded with corresponding feature level data and the Plot 
generated. Alternatively, the `address` parameter can be set to `FALSE` which 
will output the plots directly.

```r

# Profile plot
dataProcessPlots(data=summarized, type="ProfilePlot", 
                 address = FALSE, which.Protein = "P0ABU9")

# Quality control plot
dataProcessPlots(data=summarized, type="QCPlot", 
                 address = FALSE, which.Protein = "P0ABU9")

# Quantification plot for conditions
dataProcessPlots(data=summarized, type="ConditionPlot", 
                 address = FALSE, which.Protein = "P0ABU9")

1.5 Modeling

In this step we test for differential changes in protein abundance across conditions using a linear mixed-effects model. The model will be automatically adjusted based on your experimental design.

A contrast matrix must be provided to the model. Alternatively, all pairwise comparisons can be made by passing pairwise to the function. For more information on creating contrast matrices, please see the citation linked at the beginning of this document.

``` {r code groupComparison}

model = groupComparison("pairwise", summarized)

Model Details

``` {r Model }

head(model$ModelQC)

head(model$ComparisonResult)

1.5.1 groupComparisonPlot

Visualization for model-based analysis and summarizing differentially abundant proteins. To summarize the results of log-fold changes and adjusted p-values for differentially abundant proteins, groupComparisonPlots takes testing results from function groupComparison as input and automatically generate three types of figures in pdf files as output :

``` {r GroupComparisonPlots}

groupComparisonPlots( model$ComparisonResult, type="Heatmap", sig = 0.05, FCcutoff = FALSE, logBase.pvalue = 10, ylimUp = FALSE, ylimDown = FALSE, xlimUp = FALSE, x.axis.size = 10, y.axis.size = 10, dot.size = 3, text.size = 4, text.angle = 0, legend.size = 13, ProteinName = TRUE, colorkey = TRUE, numProtein = 100, clustering = "both", width = 800, height = 600, which.Comparison = "all", which.Protein = "all", address = FALSE, isPlotly = FALSE )

groupComparisonPlots( model$ComparisonResult, type="VolcanoPlot", sig = 0.05, FCcutoff = FALSE, logBase.pvalue = 10, ylimUp = FALSE, ylimDown = FALSE, xlimUp = FALSE, x.axis.size = 10, y.axis.size = 10, dot.size = 3, text.size = 4, text.angle = 0, legend.size = 13, ProteinName = TRUE, colorkey = TRUE, numProtein = 100, clustering = "both", width = 800, height = 600, which.Comparison = "Condition2 vs Condition4", which.Protein = "all", address = FALSE, isPlotly = FALSE )

### __1.6 GroupComparisonQCPlots__

To check and verify that the resultant data of `groupComparison` offers a linear 
model for whole plot inference, `groupComparisonQC` plots take the fitted data 
and provide two ways of plotting:

1. Normal Q-Q plot : Quantile-Quantile plots represents normal quantile-quantile
plot for each protein after fitting models
2. Residual plot : represents a plot of residuals versus fitted values for each 
protein in the dataset.

Results based on statistical models for whole plot level inference are accurate 
as long as the assumptions of the model are met. The model assumes that the 
measurement errors are normally distributed with mean 0 and constant variance. 
The assumption of a constant variance can be checked by examining the residuals 
from the model.


``` {r GroupComparisonQCplots, results='hide', message=FALSE, warning=FALSE}

source("..//R//groupComparisonQCPlots.R")

groupComparisonQCPlots(data=model, type="QQPlots", address=FALSE, 
                       which.Protein = "P0ABU9")


groupComparisonQCPlots(data=model, type="ResidualPlots", address=FALSE, 
                       which.Protein = "P0ABU9")

1.7 Sample Size Calculation

Calculate sample size for future experiments of a Selected Reaction Monitoring (SRM), Data-Dependent Acquisition (DDA or shotgun), and Data-Independent Acquisition (DIA or SWATH-MS) experiment based on intensity-based linear model. The function fits the model and uses variance components to calculate sample size. The underlying model fitting with intensity-based linear model with technical MS run replication. Estimated sample size is rounded to 0 decimal. Two options of the calculation:

sample_size_calc = designSampleSize(model$FittedModel,
                                    desiredFC=c(1.75,2.5),
                                    power = TRUE,
                                    numSample=5)

1.7.1 Sample Size Calculation Plot

To illustrate the relationship of desired fold change and the calculated minimal number sample size which are

The input is the result from function designSampleSize.

designSampleSizePlots(sample_size_calc, isPlotly=FALSE)

1.8 Quantification from groupComparison Data

Model-based quantification for each condition or for each biological samples per protein in a targeted Selected Reaction Monitoring (SRM), Data-Dependent Acquisition (DDA or shotgun), and Data-Independent Acquisition (DIA or SWATH-MS) experiment. Quantification takes the processed data set by dataProcess as input and automatically generate the quantification results (data.frame) with long or matrix format. The quantification for endogenous samples is based on run summarization from subplot model, with TMP robust estimation.

sample_quant_long = quantification(summarized,
                             type = "Sample",
                             format = "long")
sample_quant_long
sample_quant_wide = quantification(summarized,
                              type = "Sample",
                              format = "matrix")
sample_quant_wide
group_quant_long = quantification(summarized,
                                  type = "Group",
                                  format = "long")
group_quant_long
group_quant_wide = quantification(summarized,
                                  type = "Group",
                                  format = "matrix")
group_quant_wide


Vitek-Lab/MSstats documentation built on Nov. 9, 2024, 5:35 p.m.