BiocStyle::markdown()
knitr::opts_chunk$set(dpi=300)
library("hdxstats")
library("dplyr")
library("ggplot2")
library("RColorBrewer")
library("tidyr")
library("pheatmap")
library("scales")
library("viridis")
library("patchwork")
library("Biostrings")

A well-defined HDX-MS experiment

This vignette describeds how to analyse time-resolved differential HDX-MS experiments. The key elements are at least two conditions i.e. apo + antibody, apo + small molecule or protein closed + protien open, etc. The experiment can be replicated, though if there are sufficient time points analysed (>=3) then occasionally signficant results can be obtained. The data provided should be centroid-centric data. This package does not yet support analysis straight from raw spectra. Typically this will be provided as a .csv from tools such as dynamiX or HDExaminer.

Main elements of the package

The package relies of Bioconductor infrastructure so that it integrates with other data types and can benefit from advantages in other fields of mass-spectrometry. There are package specific object, classes and methods but importantly there is reuse of classes found in quantitative proteomics data, mainly the QFeatures object which extends the SummarisedExperiment class for mass spectrometry data. The focus of this package is on testing and visualisation of the testing results.

Data

We will begin with a structural variant experiment in which MHP and a structural variant were mixed in different proportions. HDX-MS was performed on these samples and we expect to see reproducible but subtle differences. We first load the data from the package and it is .csv format.

MBPpath <- system.file("extdata", "MBP.csv", package = "hdxstats")

We can now read in the .csv file and have a quick look at the .csv.

MBP <- read.csv(MBPpath)
head(MBP) # have a look
length(unique(MBP$pep_sequence)) # peptide sequences

Let us have a quick visualisation of some the data so that we can see some of the features

filter(MBP, pep_sequence == unique(MBP$pep_sequence[1]), pep_charge == 2) %>%
    ggplot(aes(x = hx_time, y = d, group = factor(replicate_cnt),
               color = factor(hx_sample,
                              unique(MBP$hx_sample)[c(7,5,1,2,3,4,6)]))) + 
    theme_classic() + geom_point(size = 2) + 
    scale_color_manual(values = brewer.pal(n = 7, name = "Set2")) + 
    labs(color = "experiment", x = "Deuterium Exposure", y = "Deuterium incoperation")

We can see that the units of the time dimension are in seconds and that Deuterium incoperation has been normalized into Daltons.

Parsing to an object of class QFeatures

Working from a .csv is likely to cause issues downstream. Indeed, we run the risk of accidently changing the data or corrupting the file in some way. Secondly, all .csvs will be formatted slightly different and so making extensible tools for these files will be inefficient. Furthermore, working with a generic class used in other mass-spectrometry fields can speed up analysis and adoption of new methods. We will work the class QFeatures from the QFeatures class as it is a powerful and scalable way to store quantitative mass-spectrometry data.

Firstly, the data is storted in long format rather than wide format. We first switch the data to wide format.

MBP_wide <- pivot_wider(data.frame(MBP),
                        values_from = d,
                        names_from = c("hx_time", "replicate_cnt", "hx_sample"),
                        id_cols = c("pep_sequence", "pep_charge"))
head(MBP_wide)

We notice that there are many columns with NAs. The follow code chunk removes these columns.

MBP_wide <- MBP_wide[, colSums(is.na(MBP_wide)) != nrow(MBP_wide)]

We also note that the colnames are not very informative. We are going to format in a very specific way so that later functions can automatically infer the design from the column names. We provide in the format X(time)rep(replicate)cond(condition)

colnames(MBP_wide)[-c(1,2)]

new.colnames <- gsub("0_", "0rep", paste0("X", colnames(MBP_wide)[-c(1,2)]))
new.colnames <- gsub("_", "cond", new.colnames)

# remove annoying % signs
new.colnames <- gsub("%", "", new.colnames)

# remove space (NULL could get confusing later and WT is clear)
new.colnames <- gsub(" .*", "", new.colnames)

new.colnames

We will now parse the data into an object of class QFeatures, we have provided a function to assist with this in the package. If you want to do this yourself use the readQFeatures function from the QFeatures package.

MBPqDF <- parseDeutData(object = DataFrame(MBP_wide),
                        design = new.colnames,
                        quantcol = 3:102)

Heatmap visualisations of HDX data

To help us get used to the QFeatures we show how to generate a heatmap of these data from this object:

pheatmap(t(assay(MBPqDF)),
         cluster_rows = FALSE, 
         cluster_cols = FALSE,
         color = brewer.pal(n = 9, name = "BuPu"),
         main = "Stuctural variant deuterium incoperation heatmap", 
         fontsize = 14,
         legend_breaks = c(0, 2, 4, 6, 8, 10, 12, max(assay(MBPqDF))),
         legend_labels = c("0", "2", "4", "6", "8","10", "12", "Incorporation"))

If you prefer to have the start-to-end residue numbers in the heatmap instead you can change the plot as follows:

regions <- unique(MBP[,c("pep_start", "pep_end")])
xannot <- paste0("[", regions[,1], ",", regions[,2], "]")
pheatmap(t(assay(MBPqDF)),
         cluster_rows = FALSE, 
         cluster_cols = FALSE,
         color = brewer.pal(n = 9, name = "BuPu"),
         main = "Stuctural variant deuterium incoperation heatmap", 
         fontsize = 14,
         legend_breaks = c(0, 2, 4, 6, 8, 10, 12, max(assay(MBPqDF))),
         legend_labels = c("0", "2", "4", "6", "8","10", "12", "Incorporation"),
         labels_col = xannot)

It maybe useful to normalize HDX-MS data for either interpretation or visualization purposes. We can normalize by the number of exchangeable amides or by using back-exchange correction values. We first use percentage incorporation as normalisation and visualise as a heatmap.

MBPqDF_norm1 <- normalisehdx(MBPqDF,
                             sequence = unique(MBP$pep_sequence),
                             method = "pc")


pheatmap(t(assay(MBPqDF_norm1)),
         cluster_rows = FALSE, 
         cluster_cols = FALSE,
         color = brewer.pal(n = 9, name = "BuPu"),
         main = "Stuctural variant deuterium incoperation heatmap normalised", 
         fontsize = 14,
         legend_breaks = c(0, 0.2, 0.4, 0.6, 0.8, 1, 1.2),
         legend_labels = c("0", "0.2", "0.4", "0.6", "0.8","1", "Incorporation"),
         labels_col = xannot)

Now, we demonstrate a back-exchange correction calculation. The back-exchange value are fictious by the code chunk below demonstrates how to set this up.

# made-up correction factor
correction <- (exchangeableAmides(unique(MBP$pep_sequence)) + 1) * 0.9

MBPqDF_norm2 <- normalisehdx(MBPqDF,
                             sequence = unique(MBP$pep_sequence),
                             method = "bc", 
                             correction = correction)


pheatmap(t(assay(MBPqDF_norm2)),
         cluster_rows = FALSE, 
         cluster_cols = FALSE,
         color = brewer.pal(n = 9, name = "BuPu"),
         main = "Stuctural variant deuterium incoperation heatmap normalised", 
         fontsize = 14,
         legend_breaks = c(0, 0.2, 0.4, 0.6, 0.8, 1, 1.2),
         legend_labels = c("0", "0.2", "0.4", "0.6", "0.8","1", "Incorporation"),
         labels_col = xannot)

Functional data analysis of HDX-MS data

The hdxstats package uses an empirical Bayes functional approach to analyse the data. We explain this idea in steps so that we can get an idea of the approach. First we fit the parametric model to the data. This will allow us to explore the HdxStatModel class.

res <- differentialUptakeKinetics(object = MBPqDF[,1:100], #provide a QFeature object
                                  feature = rownames(MBPqDF)[[1]][37], # which peptide to do we fit
                                  start = list(a = NULL, b = 0.0001,  d = NULL, p = 1)) # what are the starting parameter guesses

Here, we see the HdxStatModel class, and that a Functional Model was applied to the data and a total of 7 models were fitted.

res

The nullmodel and alternative slots of an instance of HdxStatModel provide the underlying fitted models. The method and formula slots provide vital information about what analysis was performed. The vis slot provides a ggplot object so that we can visualise the functional fits.

res@vis

We can also explore other functional model rather than the weibull for example a model that is the sum of two logistic curves.

res2 <- differentialUptakeKinetics(object = MBPqDF[,1:100], #provide a QFeature object,
                                  formula = value ~ a*(1 - exp(-b * timepoint)) + c * (1 - exp(-d * timepoint)),
                                  feature = rownames(MBPqDF)[[1]][37], # which peptide to do we fit
                                  start = list(a = 0, b = 0.0001,  c = 0, d = 0.0001)) # what are the starting parameter guesses

We can then visualize the output of our curve fitting

res2@vis

one question would be can we statistically say whether one model is better than another. We can do this by computing the likelihood ratio between the two proposed models. We can see the likelihoods from the second model are generally higher.

logLik(res)
logLik(res2)

Let compute minus twice the difference of the log-likelihoods:

diff <- -2 * (logLik(res) - logLik(res2))

Wilk's theorem tell us that this statistics is asymptotically distributed as chi-squared with degree of freedom equal to the difference of the degrees of freedom of each model. Each model has 4 degrees of freedom so in this case there is no need to apply any theoretical results. The second model is preferred, as it as higher log-likelihood



ococrook/hdxstats documentation built on Sept. 15, 2022, 12:24 p.m.