knitr::opts_chunk$set(echo = TRUE, fig.path = 'figures/', dev = c('png'), dpi = 300)

#library(signit)
devtools::load_all('../')
library(nnls)
library(tidyverse)
library(rstan)
library(cowplot)
library(hexbin)
library(ggraph)

library(pander)
panderOptions('table.style', 'rmarkdown')
panderOptions('table.split.table', 1000)
panderOptions('digits', 3)

SignIT: Mutation Signature Analysis in Individual Tumours with MCMC

This vignette demonstrates the use of SignIT to decipher mutation signatures in individual tumours, explore signature bleed interactions, and perform temporal dissection of mutation signatures. SignIT is available for download and installation from https://www.github.com/eyzhao/SignIT. See installation instructions in the README.

Background

Cancer is a disease characterized by the formation of tumours composed of cells which divide aggressively due to the loss of specialized functions and growth inhibition. Cancers arise due to the stepwise accumulation of spontaneous genetic mutations specific to the tumour, also known as somatic mutations. These mutations arise from many causes, including chemical mutagens, intracellular mutational processes, and deficient DNA repair genes.

While some mutations may contribute directly to tumour invasiveness, the majority are thought to be "passengers," with minimal impact on the evolutionary fitness of cancer cells. Nevertheless, it is possible to glean insights about a cancer's molecular history from these passengers. They provide evidence of the mutational processes which drove the acquisition of mutations in cancer.

Recent research has compiled 30 characteristic signatures of mutation deciphered from cancer whole genome and exome sequences. These are available from http://cancer.sanger.ac.uk/cosmic/signatures.

SignIT on a Patient's Mutation Catalog

The Mutation Catalog

A mutation catalog is a Data Frame or Tibble containing two columns named mutation_type and count. The mutation_type column is a vector of unique strings each denoting a mutation type. It is possible to define custom mutation types, so long as they match the mutation types defined in a reference mutation signature table.

An example mutation catalog is provided in the package data.

data(example_catalog)
pander(head(example_catalog))

SignIT also provides tools to determine SNV mutation catalogs from a table of variants. Here are the first ten rows of a set of example variants. You can load this data from SignIT to see the full set.

data(example_variants)
pander(head(example_variants, 10))

Use mutations_to_catalog to convert to catalog form.

example_catalog_2 <- with(
  example_variants,
  mutations_to_catalog(chr, pos, ref, alt)
)

pander(head(example_catalog_2))

Running MCMC Analysis and Exploring Posterior Distributions

Once a mutation catalog is available, obtaining the contributions of each mutation signature (also known as the exposures) is straightforward. The following line of code initializes the Bayesian model which analyzes mutation signatures and stores the output into a list structure.

patient_signature_exposures <- get_exposures(
  mutation_catalog = example_catalog,
  n_cores = 1
)

A few functions are available to explore the solutions obtained. The easiest way is to plot the posterior distributions for each mutation signature, as follows.

patient_exposure_posterior_plot <- plot_exposure_posteriors(patient_signature_exposures)
print(patient_exposure_posterior_plot)

Note that all exposure plotting functions automatically shorten signature names by removing the word "Signature". For comparison, we also provide a convenience function which computes a naive non-negative least squares (NNLS) best fit solution.

patient_exposure_nnls_plot <- plot_nnls_solution(patient_signature_exposures)
print(patient_exposure_nnls_plot)

Note how NNLS provides point estimates which mirror SignIT's solutions, but that uncertainties vary greatly between signatures. Additionally, nnls manifests mutation bleed as inappropriate elevation of signatures 16, 22, and 25.

Visualizing Mutation Signature Bleed

Signal bleed can often occur between signatures with similar mutational spectra. The result is that these signatures exhibit anticorrelated posterior sampling, suggesting that credible solutions can fit either signature. To visualize this anticorrelation, we can use the following command:

plot_two_signature_hexplot(patient_signature_exposures, 'Signature 3', 'Signature 8')

We can also simultaneously quantify the signature bleed across all signatures and plot it as a heatmap. Signatures with more negative Spearman Rho (dark blue) exhibit more signature bleed.

plot_signature_pairwise_bleed(patient_signature_exposures)

Lastly, SignIT provides a convenient visualization which overlays mutation signature bleed information on top of signature posteriors. The min_bleed parameter takes on values from 0 to 1. The higher this threshold is set, the cleaner the graph will look, but the more information is lost. We typically recommend values around 0.2 to 0.3.

plot_exposure_posteriors_bleed(patient_signature_exposures, min_bleed = 0.25)

For a quantitative summary of signature exposures, use the following command

summary_table <- get_exposure_summary_table(patient_signature_exposures)
pander(summary_table)

Structural Variant Signatures

Mutation types are not fixed. As long as you provide a reference mutation signature matrix which matches the mutation types of your mutation catalog, you can decipher mutation signatures. We will briefly demonstrate an example using structural variant signatures.

The structural variant reference signatures from Nik-Zainal et al. (2016) can be accessed as follows

sv_reference <- get_reference_signatures(signature_set = 'nikzainal_sv')
pander(head(sv_reference))

We then use a mutation catalog with matching mutation types

data(example_sv_catalog)
pander(example_sv_catalog)

Next, we perform SignIT analysis as before.

sv_exposures <- get_exposures(example_sv_catalog, reference_signatures = sv_reference, n_cores = 1)

And we can use the same plotting functions,

plot_exposure_posteriors_bleed(sv_exposures, signature_trim = 'Rearrangement Signature')

And the same summarization function

pander(get_exposure_summary_table(sv_exposures))

Temporally Dissecting Mutation Signatures

We next explore how to perform temporal dissection of mutation signatures. The input data here is a data.frame or tibble of mutations, with some additional metadata. The following glimpse shows the necessary formatting of the input. Feel free to load data(example_variants) to see the complete input data for yourself.

glimpse(example_variants)

Signature timing is performed using get_population_signatures. Note that this process can take quite some time depending on your machine. We are using method = 'vb', which employs the variational Bayes solution and is much more computationally economical.

Note that if the parameter n_populations is not provided, an automatic process of model selection is performed to attempt to estimate the optimal number of signatures.

A preferred approach (if computational resources are available) is to perform complete population signature inference for five models: 1-5 populations. Then, compute the WAIC of each model with compute_population_signatures_waic(population_signatures) and choose the model which minimizes it. WAIC is a preferred model selection metric for Bayesian inference.

population_signatures <- get_population_signatures(
  example_variants,
  n_populations = 2,
  method = 'vb'
)

Now, let's visualize the results.

plot_population_signatures(population_signatures)

This analysis is revealing two tumour subpopulations. The red higher-prevalence subpopulation is inferred to be "earlier". The Proportion refers to the fraction of mutations contributed from each subpopulation.

We see here that Signature 3 exhibits early bias while signatures 16, 17, and 30 have late bias.

For quantitative analysis workflows, use the following to provide a summary of these results:

pander(summarise_population_signatures(population_signatures))

Session Info

The below output specifies the packages used in this analysis, and their version numbers.

print(sessionInfo())


eyzhao/SignIT documentation built on Dec. 6, 2019, 11:45 a.m.