In protein mass spectrometry, protein samples are digested with enzyme or specific chemical reagents and the generated peptide fragments are identified by MS/MS analysis.

Trypsin is the most frequently used enzyme for this purpose, especially in proteomics, because of its high specificity and efficiency. However, to achieve broader coverage of the sequence, the use of either less selective proteases (chymotrypsin) or concomitant digestion with another enzyme (trypsin + chymotrypsin, trypsin + GluC etc.) is required. In some cases when anlyzing relatively pure samples, even the use of unspecific enzymes (pepsin, elastase, thermolysin) can be of benefit to improve the coverage of the protein sequence.

When using these less-specific enzymes or combinations of different enzymes, there are some occasions when we want to experimentally confirm cleavage specificity. This package provides several functions to visualize cleavage specificity based on peptide identification data.

Installation

Currently, this DigestionSpecificity package can be installed from GitHub. The package installation from GitHub requires devtools package. This devtools is available on CRAN as a standard package. To install this package, use install.package():

install.packages("devtools")

With devtools installed, you can install DigestionSpecificity as follows:

library(devtools)
install_github("ohgane/DigestionSpecificity")

This package requires protr package for extraction of dipeptide amino acid compositions, viridis package for the default pseudo-color used in plotCleavageMatrix function, and lattice for generating plots. If not installed (lattice is installed by default but the other two are not), please install these packages from CRAN.

Main Functions

The most important functions in this package are the followings:

These functions both take a vector of identified sequences and a vector of sequences after the cleavage site, and visualize cleavage site specificity. The input sequences have to be plain sequence: the sequence should not contain modifications or any other characters other than 20 amino acid single letter abbreviations.

Example Data Sets

This package contains several example data sets. Both of the data sets were generated by nanoLC-MS/MS analysis. The data sets were acquired on a LTQ-Orbitrap XL mass spectrometer operated in a data-dependent acquisition mode, with MS1 acquired on the orbitrap and MS2 (CID fragmentation) on the ion-trap detector. Peptide identification and post-processing were performed with SearchGUI^[Vaudel M, Barsnes H, Berven FS, Sickmann A & Martens L (2011) SearchGUI: An open-source graphical user interface for simultaneous OMSSA and X!Tandem searches. Proteomics 11: 996–999] and PeptideShaker^[Vaudel M, Burkhart JM, Zahedi RP, Oveland E, Berven FS, Sickmann A, Martens L & Barsnes H (2015) PeptideShaker enables reanalysis of MS-derived proteomics data sets. Nat Biotechnol 33: 22–24], and the results were exported from PeptideShaker as "Default Peptide Report.xls".

### First, load DigestionSpecificity library
library(DigestionSpecificity)

### Read an example dataset (identification result of elastase-digested cell lysate)
data(Elastase)
knitr::kable(head(Elastase[,c(2,6,7,9,10)]), caption="A subset of Elastase data set")

Visualization of digestion specificity

By providing the identified sequences (without modifications) and amino acid after the cleavage site, the specificity of the digestion can be visualied as a barplot.

We can see from this graph that cleavage occured mostly at C-terminal side of A, V, I, L, T, and S. Also, we can see that cleavage efficiency of elastase under the experimental condition was not so high (may missed cleavages), as digestion efficiency with LysC or Trypsin generally exceed 80%.

### Plot cleavage site specificity as barcharts
plotCleavage(Elastase$Sequence, Elastase$AAs.After, normalize=FALSE)

If desired, the count of the cleavage occurence can be normalized with the overall frequency of each amino acids. In this case, we can see that elastase cleaved C-terminal side of either A, V, T, I, and S, and to lesser extent the C-terminal side of H and M. Additionally, we can see that P on the N-terminal side of the cleavage site prevents the cleavage. Note that, the average of the right panel should be around 5% (i.e., 1/20).

plotCleavage(Elastase$Sequence, Elastase$AAs.After, normalize=TRUE)

Visualization of enzyme specificity at di-peptide level

In the above analysis, specific information about the sequence of the cleavage sites were not taken into consideration. To more closely examine the specificity of enzymes, it might be important to check specificity at di-peptide level.

### Plot cleavage site specificity at dipeptide level as heatmap
plotCleavageMatrix(Elastase$Sequence, Elastase$AAs.After, normalize=TRUE)

Appendix: custom plot

For the purpose of customizing the visualization, the two functions, plotCleavage and plotCleavageMatrix, return data.frame objects if (and only if) assigned to variables. The users can use these data.frames to plot the cleavage site specificity in a more flexible manner than the default plot.

### Read another dataset
data(LysCTryp)
### Retrieve cleavage site matrix for custom plotting
mat=plotCleavageMatrix(LysCTryp$Sequence, LysCTryp$AAs.After, hide.plot=TRUE, normalize=TRUE)
lattice::levelplot(mat, 
          scales=list(tck=c(1,0)),
          xlab="N-term of cleavage site",
          ylab="C-term of cleavage site")

Similarly, the barchart of digestion specificity can also be plotted with standard lattice graphics or ggplot2 graphics. With hide.plot=TRUE option, plotting by the function plotCleavage can be suppressed.

### Extract a data.frame with cleavage site data
clsite=plotCleavage(LysCTryp$Sequence, LysCTryp$AAs.After, hide.plot=TRUE, normalize=FALSE)

The returned data.frame contains the frequency of each of the amino acids. The first 6 rows of the data.frame are shown below.

knitr::kable(head(clsite), caption="The first 6 rows of the result from plotCleavage()")

By using this data.frame, the users can plot similar barchart in a much more flexible manner.

### The data can be conveniently plotted with lattice or ggplot2 (or any other your favorite) packages.
lattice::barchart(Freq~AA | terminal, groups=cleavage, data=clsite,
               stack=TRUE, auto.key=list(columns=2),
               scales=list(tck=c(1,0), alternating=FALSE), origin=0,
               xlab="Amino acids", ylab="Frequency (counts)", layout=c(2,1))


ohgane/DigestionSpecificity documentation built on May 24, 2019, 11:55 a.m.