Quantitative features for mass spectrometry data

BiocStyle::markdown()
library("QFeatures")

Introduction

The QFeatures package provides infrastructure (that is classes and the methods to process and manipulate them) to manage and analyse quantitative features from mass spectrometry experiments. It is based on the MultiAssayExperiment class from the r BiocStyle::Biocpkg("MultiAssayExperiment") [@Ramos:2017]. that stores a set of assays. Assays in a QFeatures object have a specific relation, that is depicted in figure \@ref(fig:featuresplot): assays in a QFeatures object are the result of the aggregation of quantitative features of other assays. In the case of a quantitative proteomics experiment, these different assays would be PSMs, that are aggregated into peptides, that are themselves aggregated into proteins.

par(mar = c(0, 0, 0, 0))
plot(NA, xlim = c(0, 12), ylim = c(0, 20),
     xaxt = "n", yaxt = "n",
     xlab = "", ylab = "", bty = "n")

for (i in 0:7)
    rect(0, i, 3, i+1, col = "lightgrey", border = "white")


for (i in 8:12)
    rect(0, i, 3, i+1, col = "steelblue", border = "white")

for (i in 13:18)
    rect(0, i, 3, i+1, col = "orange", border = "white")

for (i in 19)
    rect(0, i, 3, i+1, col = "darkgrey", border = "white")


for (i in 5:7)
    rect(5, i, 8, i+1, col = "lightgrey", border = "white")

for (i in 8:10)
    rect(5, i, 8, i+1, col = "steelblue", border = "white")

for (i in 11:13)
    rect(5, i, 8, i+1, col = "orange", border = "white")

for (i in 14)
    rect(5, i, 8, i+1, col = "darkgrey", border = "white")

rect(9, 8, 12, 8+1, col = "lightgrey", border = "white")
rect(9, 9, 12, 9+1, col = "steelblue", border = "white")
rect(9, 10, 12, 10+1, col = "orange", border = "white")
rect(9, 11, 12, 11+1, col = "darkgrey", border = "white")

segments(3, 8, 5, 8, lty = "dashed")
segments(3, 6, 5, 7, lty = "dashed")
segments(3, 4, 5, 6, lty = "dashed")
segments(3, 0, 5, 5, lty = "dashed")

segments(3, 10, 5, 9, lty = "dashed")
segments(3, 11, 5, 10, lty = "dashed")
segments(3, 13, 5, 11, lty = "dashed")

segments(3, 14, 5, 12, lty = "dashed")
segments(3, 16, 5, 13, lty = "dashed")
segments(3, 19, 5, 14, lty = "dashed")

segments(3, 20, 5, 15, lty = "dashed")


segments(8, 5, 9, 8, lty = "dashed")
segments(8, 8, 9, 9, lty = "dashed")
segments(8, 11, 9, 10, lty = "dashed")
segments(8, 14, 9, 11, lty = "dashed")
segments(8, 15, 9, 12, lty = "dashed")

In the following sections, we are going to demonstrate how to create a single-assay QFeatures objects starting from a spreadsheet, how to compute the next assays (peptides and proteins), and how these can be manipulated and explored.

library("QFeatures")

Creating QFeatures object

data(hlpsms)

While QFeatures objects can be created manually (see ?QFeatures for details), most users will probably possess quantitative data in a spreadsheet or a dataframe. In such cases, the easiest is to use the readQFeatures function to extract the quantitative data and metadata columns. Below, we load the hlpsms dataframe that contains data for r ncol(hlpsms) PSMs from the TMT-10plex hyperLOPIT spatial proteomics experiment from [@Christoforou:2016]. The ecol argument specifies that columns 1 to 10 contain quantitation data, and that the assay should be named psms in the returned QFeatures object, to reflect the nature of the data.

data(hlpsms)
hl <- readQFeatures(hlpsms, ecol = 1:10, name = "psms")
hl

Below, we see that we can extract an assay using its index or its name. The individual assays are stored as SummerizedExperiment object and further access its quantitative data and metadata using the assay and rowData functions

hl[[1]]
hl[["psms"]]
head(assay(hl[["psms"]]))
head(rowData(hl[["psms"]]))

For further details on how to manipulate such objects, refer to the r BiocStyle::Biocpkg("MultiAssayExperiment") [@Ramos:2017] and r BiocStyle::Biocpkg("SummerizedExperiment") [@SE] packages.

As illustrated in figure \@ref(fig:featuresplot), an central characteristic of QFeatures objects is the aggregative relation between their assays. This can be obtained with the aggregateFeatures function that will aggregate quantitative features from one assay into a new one. In the next code chunk, we aggregate PSM-level data into peptide by grouping all PSMs that were matched the same peptide sequence. Below, the aggregation function is set, as an example, to the mean. The new assay is named peptides.

hl <- aggregateFeatures(hl, "psms", "Sequence", name = "peptides", fun = colMeans)
hl
hl[["peptides"]]

Below, we repeat the aggregation operation by grouping peptides into proteins as defined by the ProteinGroupAccessions variable.

hl <- aggregateFeatures(hl, "peptides", "ProteinGroupAccessions", name = "proteins", fun = colMeans)
hl
hl[["proteins"]]

The sample assayed in a QFeatures object can be documented in the colData slot. The hl data doens't currently possess any sample metadata. These can be addedd as a new DataFrame with matching names (i.e. the DataFrame rownames must be identical assay's colnames) or can be added one variable at at time, as shown below.

colData(hl)
hl$tag <- c("126", "127N", "127C", "128N", "128C", "129N", "129C",
            "130N", "130C", "131")
colData(hl)

Subsetting

One particularity of the QFeatures infrastructure is that the features of the constitutive assays are linked through an aggregative relation. This relation is recorded when creating new assays with aggregateFeatures and is exploited when subsetting QFeature by their feature names.

In the example below, we are interested in the Stat3B isoform of the Signal transducer and activator of transcription 3 (STAT3) with accession number P42227-2. This accession number corresponds to a feature name in the proteins assay. But this protein row was computed from 8 peptide rows in the peptides assay, themselves resulting from the aggregation of 8 rows in the psms assay.

stat3 <- hl["P42227-2", , ]
stat3

We can easily visualise this new QFeatures object using ggplot2 once converted into a data.frame.

stat3_df <- data.frame(longFormat(stat3))
stat3_df$assay <- factor(stat3_df$assay,
                        levels = c("psms", "peptides", "proteins"))

library("ggplot2")
ggplot(data = stat3_df,
       aes(x = colname,
           y = value,
           group = rowname)) +
    geom_line() + geom_point() +
    facet_grid(~ assay)

Below we repeat the same operation for the Signal transducer and activator of transcription 1 (STAT1) and 3 (STAT3) accession numbers, namely P42227-2 and P42225. We obtain a new QFeatures instance containing 2 proteins, 9 peptides and 10 PSMS. From this, we can readily conclude that STAT1 was identified by a single PSM/peptide.

stat <- hl[c("P42227-2", "P42225"), , ]
stat

Below, we visualise the expression profiles for the two proteins.

stat_df <- data.frame(longFormat(stat))
stat_df$stat3 <- ifelse(stat_df$rowname %in% stat3_df$rowname,
                        "STAT3", "STAT1")
stat_df$assay <- factor(stat_df$assay,
                        levels = c("psms", "peptides", "proteins"))

ggplot(data = stat_df,
       aes(x = colname,
           y = value,
           group = rowname)) +
    geom_line() + geom_point() +
    facet_grid(stat3 ~ assay)

The subsetting by feature names is also available as a call to the subsetByFeature function, for use with the pipe operator.

library(magrittr)
hl %>%
    subsetByFeature("P42227-2")

hl %>%
    subsetByFeature(c("P42227-2", "P42225"))

and possibly

hl %>%
    subsetByFeature("P42227-2") %>%
    longFormat() %>%
    as.data.frame %>%
    ggplot(aes(x = colname,
               y = value,
               group = rowname)) +
    geom_line() +
    facet_grid(~ assay)

to reproduce the line plot.

Finally, a simply shiny app allows to explore and visualise the respective assays of a QFeatures object.

display(stat)
knitr::include_graphics("./figs/display_hmap.png", error = FALSE)
knitr::include_graphics("./figs/display_assay.png", error = FALSE)
knitr::include_graphics("./figs/display_rowdata.png", error = FALSE)

A dropdown menu in the side bar allows the user to select an assay of interest, which can then be visualised as a heatmap (figure \@ref(fig:heatmapdisplay)), as a quantitative table (figure \@ref(fig:assaydisplay)) or a row data table (figure \@ref(fig:rowdatadisplay)).

Filtering

QFeatures is assays can also be filtered based on variables in their respective row data slots using the filterFeatures function. The filters can be defined using the formula interface or using AnnotationFilter objects from the r BiocStyle::Biocpkg("AnnotationFilter") package [@AnnotationFilter]. In addition to the pre-defined filters (such as SymbolFilter, ProteinIdFilter, ... that filter on gene symbol, protein identifier, ...), this package allows users to define arbitrary character or numeric filters using the VariableFilter.

mito_filter <- VariableFilter(field = "markers",
                              value = "Mitochondrion",
                              condition = "==")
mito_filter

qval_filter <- VariableFilter(field = "qValue",
                              value = 0.001,
                              condition = "<=")
qval_filter

These filter can then readily be applied to all assays' row data slots. The mito_filter will return all PSMs, peptides and proteins that were annotated as localising to the mitochondrion.

filterFeatures(hl, mito_filter)

The qval_filter, on the other hand, will only return a subset of PSMs, because the qValue variable is only present in the psms assays. The q-values are only relevant to PSMs and that variable was dropped from the other assays.

filterFeatures(hl, qval_filter)

The same filters can be created using the forumla interface:

filterFeatures(hl, ~ markers == "Mitochondrion")
filterFeatures(hl, ~ qValue <= 0.001)

Session information {-}

sessionInfo()

References {-}



Try the QFeatures package in your browser

Any scripts or data that you put into this service are public.

QFeatures documentation built on Nov. 8, 2020, 6:51 p.m.