knitr::opts_chunk$set(
    fig.width=7, fig.height=4.5, 
    collapse = TRUE,
    eval = TRUE,
    comment = "#>"
)

Introduction

Complexome profiling or complexomics is a mass spectrometry-based method used in biology to study macromolecular complexes in their native form. Protein complexes are assessed by evaluating their migration profile across fractions. First, lysed protein sample is separated into fractions, typically by blue native electrophoresis (BN-PAGE) or density gradient centrifugation. Individual fractions are analysed by protein mass spectrometry. This allows to identify protein migration profiles across the fractions and to assess protein co-migration. It is often desirable to compare co-migration between multiple biological samples. Typically, this would require analysing each sample separately, using multiple lanes of blue native gel/multiple density gradients. However, this approach might introduce technical biases making the qualitative comparison of migration profiles as well as quantitative comparison of protein amount between different biological samples difficult. To mitigate these technical biases, biological samples can be labeled by means detectable by mass spectrometry (e.g. SILAC, TMT, iTRAQ) and analysed simultaneously. Stable Isotope Labelling with Amino acids in Cell culture (SILAC) is a method when cells are grown in the presence of amino acids with either low natural abundance “heavy” isotopes of carbon and nitrogen or the most frequent “light” isotopes. This labelling allows for reciprocally labelled samples to be mixed and multiplexed at the very early steps of experiment, making it a useful tool when experimental design requires comparison of 2 biological samples.

Here we present a ComPrAn (Complexome Profiling Analysis) R package to analyse data produced by SILAC complexomics experiments. This package does not interpret raw mass spectrometry data. As an input it takes a table of peptides that were identified by search engines such as Mascot, and quantified, e.g. in Proteome Discoverer.

This package allows to estimate relative protein abundance from peptide abundance, produces a normalized migration profiles of proteins through fractions, performs cluster analysis and contains tools for visualization of results. The analysis is indented for samples that were SILAC labelled, therefore the input file should contain both “heavy” and “light” peptides.

Method description

Below, we provide an overview of the ComPrAn use (Figure 1). There are two possible uses of the package. Here, we describe a full workflow, enabling the analysis starting from the peptide information. Second option is to use only protein workflow (described in a separate vignette). The second workflow would include only steps 3) and 4) from below. It provides greater flexibility, for example, when only normalised protein values are available or if different approach of estimation of protein intensities from peptide intensities and/or for normalisation are required.

{height=700px}

Fig 1 Schematic overview of the data analysis using ComPrA There are two entry point into ComPrAn analysis pipeline Peptide data can be imported into Peptide-to-protein part of data analysis (shown in blue). When only normalised protein data from an experiment are available, it is possible to use Protein workflow functionality of the ComPrAn package.

1) Estimation of protein intensities from peptide intensities

In mass spectrometry analysis multiple peptides are detected for each protein. During initial data processing peptides are identified, assigned to proteins and quantified. In a single sample, quantities of different peptides belonging to the same protein might differ. Differences in detected peptide quantities might be caused by different efficiency of peptides ionization, peptide charge state, modifications etc. There are many possibilities how to estimate a protein quantity from peptide quantity, including minimum/average/maximum of the quantities of all peptides that belong to a given protein or minimum/average/maximum can be calculated for a subset of peptides.

In complexomics, protein quantity values are compared among large number of fractions (typically 64 for BN-PAGE and ~20 for gradient centrifugation). Usually, peptides detected for any given protein can differ among fractions. To make sure that protein quantification values used by the ComPrAn package are comparable between fractions and between two label states, a single representative peptide is picked for each protein. Selected peptide must have the same sequence, modifications and charge state across fractions. The most abundant peptide for each protein is picked as the representative peptide.

Based on the experimental question, two scenarios to select a representative peptide are used:

After selecting a representative peptide for scenarios A and B we produce a table containing only representative peptide values for each protein.

2) Normalization of protein intensities

For easier comparison of protein co-migrations and quantities we normalize all protein quantity values to be between 0 and 1. Method of normalization differs between scenarios:

3) Hierarchical clustering

Clustering allows to identify similarity between migration profiles of proteins in an unbiased way. We can examine co-migration of known protein complexes by simply filtering the data, however, clustering provides additional information by allowing to identify unknown proteins that show similar migration profile as our proteins of interest.

This package contains functions to perform hierarchical clustering using Pearson correlation (centered or uncentered) as a distance measure and one of the three linkage methods (single, average or complete).

4) Export files and visualizations

We provide several functions to export intermediate steps of the analysis. Plotting functionality includes:

Example workflow

Read in and filter data

library(ComPrAn)
inputFile <- system.file("extData", "data.txt", package = "ComPrAn")
#read in data
peptides <- peptideImport(inputFile)
#mandatory filtering
peptides <- cleanData(peptides, fCol = "Search ID")
#optional filtering
peptides <- toFilter(peptides, rank = 1)
# separate chemical modifications and labelling into separate columns
peptides <- splitModLab(peptides) 
#remove unnecessary columns, simplify rows
peptides <- simplifyProteins(peptides) 

Pick representative peptides for scenario A and scenario B

peptide_index <- pickPeptide(peptides)

Have a look at all peptides for a protein (scatter plot)

protein <- "P52815"
max_frac <- 23
#example all peptide plot (default settings)
allPeptidesPlot(peptide_index,protein, max_frac = max_frac)

Create a list of proteins present in both/only in one label state

listOnlyOneLabState <- onlyInOneLabelState(peptide_index)

Extract values for representative peptides for each protein, for both scenario A and scenario B and perform normalization. Combine results into one data frame used in further analysis.

# produce a data frame in a format needed for downstream analysis
forAnalysis <- getNormTable(peptide_index,purpose = "analysis")
# produce a data frame in an easily human readable format that can be exported
forExport <- getNormTable(peptide_index,purpose = "export")

Once data frame with normalized protein data is produced, it can be either exported (forExport in the code above), or used for further analysis and visualization (forAnalysis in the code above). Example procedure of analysing normalized protein data is available in Protein workflow vignette. To access relevant section of Protein workflow vignette click here .

End of file



Scavetta/complexomics documentation built on Oct. 1, 2022, 2:15 a.m.