library(proteus)
library(knitr)
library(ggplot2)
knitr::opts_chunk$set(echo = TRUE)

This tutorial demonstrates how to analyse data from TMT MS/MS experiment in Proteus. We strongly suggest the user should read the main tutorial first and familiarize themselves with the package.

vignette("proteus")

Here, we are only going to point out the differences between label-free and TMT analysis. For this, we use data from Crozier et al. 2018, Pride accession PXD008741. These data are distributed in a separate package proteusTMT, which needs loading first:

library(proteusTMT, warn.conflicts=FALSE)
data(proteusTMT)
library(proteusTMT)
data(proteusTMT)

Input data

Columns in evidence file

The default measure.cols object is designed for label-free data. For TMT data we need to specify all reporter intensity columns. In out example we have 10 reporter columns, numbered from 0 to 9:

measCols <- paste0("Reporter intensity ", 0:9)
names(measCols) <- paste0("reporter_", 0:9)

The resulting object is

str(as.list(measCols))

We are going to use the default evidenceColumns object, but the user should carefully check that the column names correspond to those in the evidence file.

Read evidence file

Due to its large size we do not attach the TMT evidence file to this package. The command to read the file would be:

evi <- readEvidenceFile(evidenceFile, measure.cols=measCols)

We do attach an example processed evidence data with the package, it is called evi. It contains all reported intensity columns:

head(evi)

Metadata

Metadata for this experiment needs to specify all the reporter columns, experiments and corresponding condition and replicates:

metadataFile <- system.file("extdata", "metadata.txt", package="proteusTMT")
meta <- read.delim(metadataFile, header=TRUE, sep="\t")
meta

No 'Experiment' column?

It is possible to run MaxQuant for one experiment where conditions are encoded in reporter columns only. In this case the evidence file will not contain the column 'Experiment'. Proteus can deal with such configuration. All you need is to remove the 'experiment' field from the default column list:

eviCols <- evidenceColumns
eviCols$experiment <- NULL

and read evidence file with an additional argument:

evi <- readEvidenceFile(evidenceFile, measure.cols=measCols, data.cols = eviCols)

The 'Experiment' column will be skipped. Obviously, metadata should not contain 'Experiment' column either.

Please note that in such case peptide and protein objects created by Proteus will contain a metadata data frame (metadata field in the object, see, e.g., pepdat$metadata) with column experiment filled with dummy values. This is done for consistent internal processing. OK, we are lazy and it was easier to do it this way.

Peptide data

Create a peptide dataset

We chose the median to aggregate peptide intensities from multiple entries in the evidence data:

pepdat <- makePeptideTable(evi, meta, measure.cols=measCols, aggregate.fun=aggregateMedian, experiment.type="TMT")

We can use generic summary function to see more information about pepdat.

summary(pepdat)

Number of peptides

This is the number of non-zero peptide intensities per sample.

plotCount(pepdat)

Protein data

Create protein dataset

We create protein data using the high-flyer method.

prodat <- makeProteinTable(pepdat, aggregate.fun=aggregateHifly, hifly=3)

Again, we can use a generic summary function to see its properties.

summary(prodat)

Normalization

For TMT data we recommend using CONSTANd normalization Maes et al. 2016.

prodat.norm <- normalizeTMT(prodat)

These two figures show reporter intensity distributions before and after normalization.

plotSampleDistributions(prodat, fill="replicate") + labs(title="Before")
plotSampleDistributions(prodat.norm, log.scale=FALSE, fill="replicate") + labs(title="After")

Protein clustering

We can use the same function plotClustering() to see the dendrogram for the proteins.

plotClustering(prodat.norm)

Differential expression

These particular data come from a time-course experiment that would require analysis beyond Proteus package. We can however show how differential expression can be used to identify proteins changing between time points 0.5^h^ and 8^h^.

res <- limmaDE(prodat.norm, conditions=c("T0.5", "T8"))

The distribution of p-values looks fine.

plotPdist(res)

The top of the table of differentially expressed proteins:

res <- res[order(res$adj.P.Val),]
head(res)

And the top protein from the table:

prot <- as.character(res$protein[1])
plotIntensities(prodat.norm, id=prot)


bartongroup/Proteus documentation built on April 22, 2023, 5:33 a.m.