BiocStyle::markdown()
library("QFeatures")
The QFeatures
package provides infrastructure (that is classes and
the methods to process and manipulate them) to manage and analyse
quantitative features from mass spectrometry experiments. It is based
on the MultiAssayExperiment
class from the r BiocStyle::Biocpkg("MultiAssayExperiment")
[@Ramos:2017]. that stores a set of assays. Assays in a QFeatures
object have a specific relation, that is depicted in figure
\@ref(fig:featuresplot): assays in a QFeatures
object are the result
of the aggregation of quantitative features of other assays. In the
case of a quantitative proteomics experiment, these different assays
would be PSMs, that are aggregated into peptides, that are themselves
aggregated into proteins.
par(mar = c(0, 0, 0, 0)) plot(NA, xlim = c(0, 12), ylim = c(0, 20), xaxt = "n", yaxt = "n", xlab = "", ylab = "", bty = "n") for (i in 0:7) rect(0, i, 3, i+1, col = "lightgrey", border = "white") for (i in 8:12) rect(0, i, 3, i+1, col = "steelblue", border = "white") for (i in 13:18) rect(0, i, 3, i+1, col = "orange", border = "white") for (i in 19) rect(0, i, 3, i+1, col = "darkgrey", border = "white") for (i in 5:7) rect(5, i, 8, i+1, col = "lightgrey", border = "white") for (i in 8:10) rect(5, i, 8, i+1, col = "steelblue", border = "white") for (i in 11:13) rect(5, i, 8, i+1, col = "orange", border = "white") for (i in 14) rect(5, i, 8, i+1, col = "darkgrey", border = "white") rect(9, 8, 12, 8+1, col = "lightgrey", border = "white") rect(9, 9, 12, 9+1, col = "steelblue", border = "white") rect(9, 10, 12, 10+1, col = "orange", border = "white") rect(9, 11, 12, 11+1, col = "darkgrey", border = "white") segments(3, 8, 5, 8, lty = "dashed") segments(3, 6, 5, 7, lty = "dashed") segments(3, 4, 5, 6, lty = "dashed") segments(3, 0, 5, 5, lty = "dashed") segments(3, 10, 5, 9, lty = "dashed") segments(3, 11, 5, 10, lty = "dashed") segments(3, 13, 5, 11, lty = "dashed") segments(3, 14, 5, 12, lty = "dashed") segments(3, 16, 5, 13, lty = "dashed") segments(3, 19, 5, 14, lty = "dashed") segments(3, 20, 5, 15, lty = "dashed") segments(8, 5, 9, 8, lty = "dashed") segments(8, 8, 9, 9, lty = "dashed") segments(8, 11, 9, 10, lty = "dashed") segments(8, 14, 9, 11, lty = "dashed") segments(8, 15, 9, 12, lty = "dashed")
In the following sections, we are going to demonstrate how to create a
single-assay QFeatures
objects starting from a spreadsheet, how to
compute the next assays (peptides and proteins), and how these can be
manipulated and explored.
library("QFeatures")
QFeatures
objectdata(hlpsms)
While QFeatures
objects can be created manually (see ?QFeatures
for
details), most users will probably possess quantitative data in a
spreadsheet or a dataframe. In such cases, the easiest is to use the
readQFeatures
function to extract the quantitative data and metadata
columns. Below, we load the hlpsms
dataframe that contains data for
r ncol(hlpsms)
PSMs from the TMT-10plex hyperLOPIT spatial proteomics experiment
from [@Christoforou:2016]. The ecol
argument specifies that columns
1 to 10 contain quantitation data, and that the assay should be named
psms
in the returned QFeatures
object, to reflect the nature of the
data.
data(hlpsms) hl <- readQFeatures(hlpsms, ecol = 1:10, name = "psms") hl
Below, we see that we can extract an assay using its index or its
name. The individual assays are stored as SummerizedExperiment
object and further access its quantitative data and metadata using
the assay
and rowData
functions
hl[[1]] hl[["psms"]] head(assay(hl[["psms"]])) head(rowData(hl[["psms"]]))
For further details on how to manipulate such objects, refer to the
r BiocStyle::Biocpkg("MultiAssayExperiment")
[@Ramos:2017] and
r BiocStyle::Biocpkg("SummerizedExperiment")
[@SE] packages.
As illustrated in figure \@ref(fig:featuresplot), an central
characteristic of QFeatures
objects is the aggregative relation
between their assays. This can be obtained with the
aggregateFeatures
function that will aggregate quantitative features
from one assay into a new one. In the next code chunk, we aggregate
PSM-level data into peptide by grouping all PSMs that were matched the
same peptide sequence. Below, the aggregation function is set, as an
example, to the mean. The new assay is named peptides.
hl <- aggregateFeatures(hl, "psms", "Sequence", name = "peptides", fun = colMeans) hl hl[["peptides"]]
Below, we repeat the aggregation operation by grouping peptides into proteins as defined by the ProteinGroupAccessions variable.
hl <- aggregateFeatures(hl, "peptides", "ProteinGroupAccessions", name = "proteins", fun = colMeans) hl hl[["proteins"]]
The sample assayed in a QFeatures
object can be documented in the
colData
slot. The hl
data doens't currently possess any sample
metadata. These can be addedd as a new DataFrame
with matching names
(i.e. the DataFrame
rownames must be identical assay's colnames) or
can be added one variable at at time, as shown below.
colData(hl) hl$tag <- c("126", "127N", "127C", "128N", "128C", "129N", "129C", "130N", "130C", "131") colData(hl)
One particularity of the QFeatures
infrastructure is that the
features of the constitutive assays are linked through an aggregative
relation. This relation is recorded when creating new assays with
aggregateFeatures
and is exploited when subsetting QFeature
by their
feature names.
In the example below, we are interested in the Stat3B isoform of the Signal transducer and activator of transcription 3 (STAT3) with accession number P42227-2. This accession number corresponds to a feature name in the proteins assay. But this protein row was computed from 8 peptide rows in the peptides assay, themselves resulting from the aggregation of 8 rows in the psms assay.
stat3 <- hl["P42227-2", , ] stat3
We can easily visualise this new QFeatures object using ggplot2
once converted into a data.frame
.
stat3_df <- data.frame(longFormat(stat3)) stat3_df$assay <- factor(stat3_df$assay, levels = c("psms", "peptides", "proteins")) library("ggplot2") ggplot(data = stat3_df, aes(x = colname, y = value, group = rowname)) + geom_line() + geom_point() + facet_grid(~ assay)
Below we repeat the same operation for the Signal transducer and
activator of transcription 1 (STAT1) and 3 (STAT3) accession numbers,
namely P42227-2 and P42225. We obtain a new QFeatures
instance
containing 2 proteins, 9 peptides and 10 PSMS. From this, we can
readily conclude that STAT1 was identified by a single PSM/peptide.
stat <- hl[c("P42227-2", "P42225"), , ] stat
Below, we visualise the expression profiles for the two proteins.
stat_df <- data.frame(longFormat(stat)) stat_df$stat3 <- ifelse(stat_df$rowname %in% stat3_df$rowname, "STAT3", "STAT1") stat_df$assay <- factor(stat_df$assay, levels = c("psms", "peptides", "proteins")) ggplot(data = stat_df, aes(x = colname, y = value, group = rowname)) + geom_line() + geom_point() + facet_grid(stat3 ~ assay)
The subsetting by feature names is also available as a call to the
subsetByFeature
function, for use with the pipe operator.
library(magrittr) hl %>% subsetByFeature("P42227-2") hl %>% subsetByFeature(c("P42227-2", "P42225"))
and possibly
hl %>% subsetByFeature("P42227-2") %>% longFormat() %>% as.data.frame %>% ggplot(aes(x = colname, y = value, group = rowname)) + geom_line() + facet_grid(~ assay)
to reproduce the line plot.
Finally, a simply shiny
app allows to explore and visualise the
respective assays of a QFeatures
object.
display(stat)
knitr::include_graphics("./figs/display_hmap.png", error = FALSE)
knitr::include_graphics("./figs/display_assay.png", error = FALSE)
knitr::include_graphics("./figs/display_rowdata.png", error = FALSE)
A dropdown menu in the side bar allows the user to select an assay of interest, which can then be visualised as a heatmap (figure \@ref(fig:heatmapdisplay)), as a quantitative table (figure \@ref(fig:assaydisplay)) or a row data table (figure \@ref(fig:rowdatadisplay)).
QFeatures is assays can also be filtered based on variables in their
respective row data slots using the filterFeatures
function. The
filters can be defined using the formula interface or using
AnnotationFilter
objects from the r BiocStyle::Biocpkg("AnnotationFilter")
package [@AnnotationFilter]. In addition to the pre-defined filters (such as
SymbolFilter
, ProteinIdFilter
, ... that filter on gene symbol,
protein identifier, ...), this package allows users to define
arbitrary character or numeric filters using the VariableFilter
.
mito_filter <- VariableFilter(field = "markers", value = "Mitochondrion", condition = "==") mito_filter qval_filter <- VariableFilter(field = "qValue", value = 0.001, condition = "<=") qval_filter
These filter can then readily be applied to all assays' row data
slots. The mito_filter
will return all PSMs, peptides and proteins
that were annotated as localising to the mitochondrion.
filterFeatures(hl, mito_filter)
The qval_filter
, on the other hand, will only return a subset of
PSMs, because the qValue
variable is only present in the psms
assays. The q-values are only relevant to PSMs and that variable was
dropped from the other assays.
filterFeatures(hl, qval_filter)
The same filters can be created using the forumla interface:
filterFeatures(hl, ~ markers == "Mitochondrion") filterFeatures(hl, ~ qValue <= 0.001)
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.