library(proteus)
library(ggplot2)
library(knitr)
knitr::opts_chunk$set(echo = TRUE)

This tutorial demonstrates how to analyse data from SILAC MS/MS experiment in Proteus. We strongly suggest the user should read the main tutorial first and familiarize themselves with the package.

vignette("proteus")

Here, we are only going to point out the differences between label-free and SILAC analysis. For this, we use a small subset of data from Ly et al. (2018). These data are distributed in a separate package proteusSILAC, which needs loading first:

library(proteusSILAC, warn.conflicts=FALSE)
data(proteusSILAC)
library(proteusSILAC)
data(proteusSILAC)

Input data

Columns in evidence file

The default measure.cols object is designed for label-free data. For SILAC data we need to specify isotope ratios. Our data contain M/L ratios:

measCols <- list(
  ML = "ratio_ml"
)

The column naming convention in our evidence file is different from the default, included in the evidenceColumns object. Hence, we need to create a new named list, as follows:

eviCols <- list(
  sequence = 'pep_sequence',
  modified_sequence = 'modified_sequence',
  modifications = 'modifications',
  protein_group = 'proteins',
  protein = 'leading_razor_protein',
  experiment = 'experiment',
  charge = 'charge',
  reverse = 'reverse',
  contaminant = 'potential_contaminant'
)

Read evidence file

Due to its large size we do not attach the SILAC evidence file to this package. The command to read the file would be:

evi <- readEvidenceFile(evidenceFile, measure.cols=measCols, data.cols=eviCols, zeroes.are.missing=FALSE)

Please note the argument zeroes.are.missing=FALSE. By default, readEvidenceFile assumes that zeroes represent missing data (which usually is true for label-free and TMT experiments). In this case we read ratios which, in theory, can be zeroes. In most cases its doesn't make any difference.

We do attach an example of processed evidence data with this package, it is called evi. It contains all reported intensity columns:

head(evi)

Metadata

Metadata for this experiment needs to specify all the reporter columns, experiments and corresponding condition and replicates:

metadataFile <- system.file("extdata", "metadata.txt", package="proteusSILAC")
meta <- read.delim(metadataFile, header=TRUE, sep="\t")
meta

Peptide data

Create a peptide dataset

We chose the median to aggregate peptide intensities from multiple entries in the evidence data:

pepdat <- makePeptideTable(evi, meta, measure.cols=measCols, aggregate.fun=aggregateMedian, experiment.type="SILAC")

We can use generic summary function to see more information about xppepdat.

summary(pepdat)

Number of peptides

This is the number of non-zero peptide intensities per sample.

plotCount(pepdat)

Protein data

Create protein dataset

Now we create protein data. The aggregate.fun=aggregateMedian indicates that protein ratios are medians of constituent peptide ratios. This approach is appropriate for SILAC data, unlike the default hi-flyer method which only works for label-free (or TMT) data.

prodat <- makeProteinTable(pepdat, aggregate.fun=aggregateMedian)

Again, we can use a generic summary function to see its properties.

summary(prodat)

Normalization

We normalize protein ratios to the median, that is, after the normalization each sample is going to have the same median. Median is the default normalization, so we do not need to specify the norm.fun parameter:

prodat.norm <- normalizeData(prodat)

These two figures show reporter intensity distributions before and after normalization.

plotSampleDistributions(prodat, fill="replicate") + labs(title="Before")
plotSampleDistributions(prodat.norm, fill="replicate") + labs(title="After")

Differential expression

For SILAC data we sometimes need to do differential expression on an isotope ratio, e.g. $M/L$. In our case, $M$ and $L$ labelled two different biological conditions, hence the $M/L$ ratio compares them. The null hypothesis is that $M/L = 1$ or $\log H/L = 0$. Because SILAC ratios tend to be symmetric in log space (see figure above), we chose the latter approach. Let's do the differential expression for the time point T48.

res <- limmaRatioDE(prodat.norm, condition="T48")
res <- res[order(res$P.Value),]
head(res)

There are no statistically significant departures from the null hypothesis. Here is the volcano plot:

plotVolcano(res, binhex=FALSE)

Alternatively, we can compare the two time points with a two-condition differential expression:

res2 <- limmaDE(prodat.norm)
res2 <- res2[order(res2$P.Value),]
head(res2)

Though not statistically significant at the assumes 0.05 level, protein P03372-3 stands out. Here is a SILAC ratio plot for this protein:

plotIntensities(prodat.norm, id="P03372-3")


bartongroup/Proteus documentation built on April 22, 2023, 5:33 a.m.