Introduction

Targeted liquid chromatography coupled with mass spectrometry (LC-MS) is a powerful tool for detection and quantification of peptides in complex matrices. An important objective of targeted LC-MS is to obtain peptide quantifications that are (1) suitable for the purpose of the investigation, and (2) reproducible across laboratories and runs. The first objective is achieved by system suitability tests (SST), which verify that mass spectrometric instrumentation performs as specified. The second objective is achieved by quality control (QC), which provides in-process quality assurance of the sample profile. A common aspect of SST and QC is the longitudinal nature of their data. Although SST and QC receive a lot of attention in the proteomic community, the currently used statistical methods are fairly limited. MSstatsQC improves upon the existing statistical methodology for SST and QC. It translates the modern methods of longitudinal statistical process control, such as simultaneous and time weighted control charts and change point analysis to the context of LC-MS experiments The methods are implemented in an open-source R-based software package and its web-based graphical user interface (www.msstats.org/msstatsqc), and are available for use stand-alone, or for integration with automated pipelines. Example dataset was generated during CPTAC Study 9.1 at Site 54. Although the example focus on targeted proteomics, the statistical methods more generally apply.

This vignette summarizes various aspects of all functionalities in MSstatsQC package and Shiny interface.

Input

In order to analyze QC/SST data in MSstatsQC, input data must be a .csv file in a "long" format with related columns. This is a common data format that can be generated from spectral processing tools such as Skyline.

This file can be also uploaded through Shiny interface. If the user prefers the web-interface then Upload file option should be used and a csv file which includes QC metrics and peptide level data should be uploaded. The recommended format includes Acquired Time, Peptide name, Annotations and data for any QC metrics such as Retention Time, Total Peak Area and Mass Accuracy etc.

(a) AcquiredTime: This column shows the acquired time of the QC/SST sample in the format of MM/DD/YYYY HH:MM:SS AM/PM

(b) Precursor: This column shows information about Precursor id. Statistical analysis will be done separately for each unique label in this column.

(c) Annotations: Annotations are free-text information given by the analyst about each run. They can be informative explanations of any special cause or any observations related to a particular run. Annotations are carried in the plots provided by MSstatsQC interactively.

(d)-(f) RetentionTime, TotalPeakArea, FWHM, MassAccuracy, and PeakAssymetry, and other metrics: These columns define a feature of a peak for a specific peptide.

The example dataset is shown below. Each row corresponds to a single time point. Additionally, other inputs such as predefined limits or guide sets are discussed in further steps.

Example

#A typical multi peptide and multi metric system suitability dataset
#This dataset was generated during CPTAC Study 9.1 at Site 54
S9Site54 <- read.csv('Sampledata_CPTAC_Study_9_1_Site54.csv', header=TRUE) 

Tip for Shiny users:

"Data import" tab is used to upload data. User can also run with sample data and clear the outputs with the related buttons.

MSstatsQC functions

MSstatsQC provides several core functions for state-of-the art statistical process control methods that are not available in other software. Three main functions for monitoring are: XmRPlots(),CUSUMPlots and ChangePointEstimator functions. All of these core functions have the same architecture and they share the same arguments. In general, we will call these three functions as APlotFunction and we will later give details for each function. Now, we introduce the arguments for APlotFunction.

Arguments

Example

#A general MSstatsQC plot function when a guide set (1-20 runs) is used to monitor the mean of a metric
APlotFunction( S9Site54, peptide, L = 1, U = 5, metric, normalization = FALSE,
                      ytitle = "A Plot", type = "mean", selectMean = NULL, selectSD = NULL )

MSstatsQC processes data and creates plots with core functions. However, multi metric and multi peptide cases are difficult to assess. Therefore, we introduce several summary functions that aggregate the conclusions from each metric over all the analytes and summaries obtained for a core function method. RadarPlots() and RiverPlots are MSstatsQC summary functions. In general, we will call these two functions as ASummaryFunction and we will later give details for each function. Now, we introduce the arguments for ASummaryFunction.

Arguments

Example

#A general MSstatsQC summary function when a guide set (1-5 runs) is used 
ASummaryFunction( S9Site54, L = 1, U = 5, listMean = NULL, listSD = NULL )

Output

Plots created by the core plot functions are generate by plotly which is an R package for interactive plot generation. The interactive plots created by MSstatsQC, can be saved as an html file using the save widget function. If the user wants to save a static png file, then export function can be used. The outputs of other MSstatsQC functions are generated by ggplot2 package and saving those outputs would require using ggsave function.

Example

#Saving plots generated by plotly
p<-APlotFunction(S9Site54, peptide="a peptide", metric="a metric")
htmlwidgets::saveWidget(p, "Aplot.html")
export(p, file = "Aplot.png")

#Saving plots generated by ggplot2
p<-ASummaryFunction(S9Site54, L=1, U=5)
ggsave(filename="Summary.pdf", plot=p)
#or
ggsave(filename="Summary.png", plot=p)

Tip for Shiny users:

Each output generated by 'plotly' can be saved using the "plotly" toolset located in the corner of each plot.

Data processing

Data is checked with DataProcess() function to ensure data sanity and efficiently use core and summary MSstatsQC funtions. MSstatsQC uses a data validation method where slight variations in column names are compansated and converted to the standard MSstatsQC format. For example, our data validation function converts column names like Best.RT, best retention time, retention time, rt and best ret into BestRetentionTime. This conversion also deals with case-sensitive typing.

Arguments

Example

DataProcess(S9Site54)

Tip for Shiny users:

"Data import" tab automatically checks data and validate it for further use.

Setting options

Chosing metrics

Metrics (e.g. retention time and peak area) and peptides are chosen within all core functions with 'metric' argument. MSstatsQC can handle any metrics of interest. User needs to create data columns just after Annotations to import metrics into MSstatsQC successfully.

Example

#Defining metric of interest within APlotFunction
APlotFunction(S9Site54, "LVNELTEFAK", metric = "TotalArea")
APlotFunction(S9Site54, "LVNELTEFAK", metric = "RetentionTime")

Defining mean and variability of a metric

Setting predefined limits

Predefined limits are commonly used in system sutiability studies. If the mean and variability of a metric is well known, they can be defined using 'selectMean' and 'selectSD' arguments in core plot functions (e.g. XmRplots function). For example, if mean of retention time is 34.5 minutes, standard deviation is 1 minutes and X chart is used for peptide LVNELTEFAK, we use XmRplot function as follows.

Example

#Mean and standard deviation of LVNELTEFAK is known
APlotFunction(data, "LVNELTEFAK", metric = "RetentionTime", selectMean = 34.5, selectSD = 1 )

Similarly, if the mean and standard deviation is known, summary functions uses listMean and listSD arguments. For example, if user monitors retention time and peak assymetry and mean and standard deviations of these metrics are known, arguments will require entering a vector for means and another vector for standard deviations.

Example

# Retention time >> mean is 34.5 and standard deviation is 1.0
# Peak assymetry >> mean is 1.0  and standard deviation is 0.01
ASummaryFunction(data, listMean=list("Retention time” = 34.5, “Peak asymmetry” = 1.0), listSD = list("Retention time” = 1, “Peak asymmetry” = 0.01))

Choosing a guide set

The true values of mean and variability of a metric is typically unknown, and their estimates are obtained from a guide set of high quality runs. Generally, a data gathering and parameter estimation step is required. Within that phase, control limits are obtained to test the hypothesis of statistical control. These thresholds are selected to ensure a specified type I error probability (e.g. 0.0027). Constructing control charts and real time evaluation are considered after achieving this phase. Guide sets are defined with 'L' and 'U' arguments. For example, if retention time of a peptide is monitored and first 20 observations of the dataset are used as a guide set, a plot is constructed as follows.

Example

#Guide set is chosen as the first 20 observations of dataset
APlotFunction(data, "peptide", metric = "QC metric", L=1, U=20) 

Guide sets are also required for summary functions when the mean and standard deviation of a metric is not known.

Example

#Guide set is chosen as the first 20 observations of dataset
ASummaryFunction(data, L=1, U=20) 

Tip for Shiny users:

"Options" tab is used to set metrics and peptides of interest. Guide set and known mean and standard deviation are also set within "Options" tab.Select a proper and representative guide set using Options tab. The lower bound of guide set indicates the index of the first time point to be included in the guide set. For example, if you choose "1" as a lower bound, it means that first time point will be the first element of the guide set. Similarly, upper bound of guide set shows the index for the last observation. It is possible to use different guide sets for different metrics and peptides.

MSstatsQC functions: control charts

These fuctions are used to generate individual (X) and moving range (mR), and cumulative sum for mean (CUSUMm) and cumulative sum for variability (CUSUMv) control charts for each metric. X and mR charts are constructed using XmRPlots() function. For example, if user constructs an X control chart of retention time for LVNELTEFAK and retention time mean and standard deviations are present, the function is used as follows.

Example:

# Retention time >> mean is 34.5 and standard deviation is 1.0
XmRPlots(S9Site54, "LVNELTEFAK", metric = "RetentionTime", type="mean", selectMean = 34.5, selectSD = 1)

If user constructs an X control chart of retention time and peak area for LVNELTEFAK and means and standard deviations are not known, the function is used as follows.

Example:

# Retention time >> first 20 observations are used as a guide set
XmRPlots(S9Site54, "LVNELTEFAK", metric = "RetentionTime", type="mean", L = 1, U = 20)
XmRPlots(S9Site54, "LVNELTEFAK", metric = "TotalPeakArea", type="mean", L = 1, U = 20)

If user constructs an mR control chart of retention time and peak area for LVNELTEFAK and mean and standard deviations are not known, the function is used as follows.

Example:

# Retention time >> first 20 observations are used as a guide set
XmRPlots(S9Site54, "LVNELTEFAK", metric = "RetentionTime", type="dispersion", L = 1, U = 20)
XmRPlots(S9Site54, "LVNELTEFAK", metric = "TotalPeakArea", type="dispersion", L = 1, U = 20)

Similarly, if user constructs a CUSUMm or a CUSUMv control chart of retention time and peak area for LVNELTEFAK and mean and standard deviations are not known, CUSUMPlots function is used as follows.

# Retention time >> first 20 observations are used as a guide set
CUSUMPlots(S9Site54, "LVNELTEFAK", metric = "RetentionTime", type="mean", L=1, U=20, ytitle="CUSUMm")
CUSUMPlots(S9Site54, "LVNELTEFAK", metric = "TotalPeakArea", type="mean", L=1, U=20, ytitle="CUSUMm")
CUSUMPlots(S9Site54, "LVNELTEFAK", metric = "RetentionTime", type="dispersion", L=1, U=20, ytitle="CUSUMv")
CUSUMPlots(S9Site54, "LVNELTEFAK", metric = "TotalPeakArea", type="dispersion", L=1, U=20, ytitle="CUSUMv")

Tip for Shiny users:

"Control charts" tab is used to construct X and mR and CUSUMm and CUSUMv control charts.

Change Point Analysis

Follow-up change point analysis is helpful to identify the time of a change for each peptide and metric. ChangePointEstimator() function is used for the analysis. This function is one of the core functions and uses the same arguments. We recommend using this function after control charts generate an out-of-control observation. For example, retention time of TAAYVNAIEK increases over time as CUSUMm statistics increases steadily after the 20th time point. User can follow-up with ChangePointEstimator() function to find the exact time of retention time drift.

Example

# Retention time >> first 20 observations are used as a guide set
XmRPlots(S9Site54, "TAAYVNAIEK", metric = "RetentionTime", type="mean", L = 1, U = 20)
ChangePointEstimator(S9Site54, "TAAYVNAIEK", metric = "RetentionTime", type="mean", L = 1, U = 20)

We don't recommend using this function when all the observations are within control limits. In the case of retention time monitoring of LVNELTEFAK, there is no need to further analyse change point.

The time of a variability change can be analyzed with the same fucntion. For example, retention time of YSTDVSVDEVK experiences a drift in the mean of retention time and variability of retention time increases simultaneously. In this case, ChangePointEstimator() can be used to identify exact times of both changes.

Example

# Retention time >> first 20 observations are used as a guide set
XmRPlots(S9Site54, "YSTDVSVDEVK", metric = "RetentionTime", type="mean", L = 1, U = 20)
ChangePointEstimator(S9Site54, "YSTDVSVDEVK", metric = "RetentionTime", type="dispersion", L = 1, U = 20)

Tip for Shiny users:

MSstatsQC functions: river and radar plots

XmRRiverPlots() and XmRRadarPlots() functions are the summary functions used in MSstatsQC. They are used to aggregate results over all analytes for X and mR charts. For example, if user would like to aggregate the information gathered from the X charts of retention time for all analytes, upper panel of XmRRiverPlots() show for the increases and decreases in retention time. Next, XmRRadarPlots() are used to find out which peptides are affected by the problem.

Example

# Retention time >> first 20 observations are used as a guide set
XmRRiverPlots(S9Site54, L=1, U=20)
XmRRadarPlots(S9Site54, L=1, U=20)

CUSUMRiverPlots() and CUSUMRadarPlots() functions are the summary functions used in MSstatsQC. They are used to aggregate results over all analytes for CUSUMm and CUSUMv charts. For example, if user would like to aggregate the information gathered from the CUSUMm charts of retention time for all analytes, upper panel of CUSUMRiverPlots() show for the increases and decreases in retention time. Next, CUSUMRadarPlots() are used to find out which peptides are affected by the problem.

Example

# Retention time >> first 20 observations are used as a guide set
CUSUMRiverPlots(S9Site54, L=1, U=20)
CUSUMRadarPlots(S9Site54, L=1, U=20)

Tip for Shiny users:

Summary plots are available in the Metric summary tab under Detailed performance: plot summaries.

MSstatsQC functions: decision map

DecisionMap() functions another summary function used in MSstatsQC. It is used to compare aggregated results over all analytes for a certain method such as XmR charts with the user defined criteria. Firstly, user defines the performance criteria and run DecisionMap() function to visualize overall performance. This function uses all the arguments of summary plots listed previously. Additionally, the following arguments are used

Arguments

Example

# A decision map for Site 54 can be generated using the following script
# Retention time >> first 20 observations are used as a guide set
DecisionMap(S9Site54,method="XmR",peptideThresholdRed = 0.25,peptideThresholdYellow = 0.10,
                          L = 1,U = 20,type = "mean",title = "Decision map",listMean = NULL,listSD = NULL)

Tip for Shiny users:

Input for decision map can be selected using Create decision rules tab. After selection of thresholds decision maps are automatically created and available in the Metric summary tab under Detailed performance: plot summaries.



srtaheri/MSstatsQC documentation built on May 30, 2019, 8:41 a.m.