In caravagnalab/mobster: MOBSTER - Model-based tumour subclonal deconvolution

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(mobster)
library(tidyr)
library(dplyr)

Computing dnds values

mobster interfaces with the dndscv R package to compute dN/dS values from its output clusters. The method implemented in dndscv is described in Martincorena, et al. "Universal patterns of selection in cancer and somatic tissues", Cell 171.5 (2017): 1029-1041; PMID 29056346).

Requirements. In order to be able to compute dN/dS values mutations data must store their genomic coordinates:

chromosome location chrom,
position from,
reference alleles alt and ref.

Besides, it is important to know what is the reference genome used to align the genome; this information will be used by dndscv to annotate input mutations.

We show this analysis with the fits for one of the lung samples available in the package.

fit = mobster::LUFF76_lung_sample

# Print and plot the model
print(fit$best)
plot(fit$best)

We compute the values using the clustering assignments from the best fit.

clusters = Clusters(fit$best)
print(clusters)

The available clusters are C1 and Tail; C1 is the clonal cluster. We compute dN/dS with the default parameters.

# Run by cluster and default gene list
dnds_stats = dnds(
  clusters,
  gene_list = NULL
)

The statistics can be computed for a custom grouping of the clusters. Here it does not make much difference because we have only the clonal cluster, and the tail; but if we had one subclone C2 we could have pooled together the mutations in the clones using

# Not run here
dnds_stats = dnds(
  clusters,
  mapping = c(`C1` = 'Non-tail', `C2` = 'Non-tail', `Tail` = 'Tail'),
  gene_list = NULL
)

In the above analysis we have run dndscv using the default gene list (gene_list = NULL). Notice that errors raised by dndscv are intercepted by mobster; some of this errors might originate from a dataset with not enough substitutions to compute dN/dS.

The call returns:

the table computed by dndscv, where column dnds_group labels the group.
a ggplot plot of the point estimates and the confidence interval;

# Summary statistics
print(dnds_stats$dnds_summary)

# Table observation countns
print(dnds_stats$dndscv_table)

# Plot
print(dnds_stats$plot)

The default plot contains results obtained from all substitution models available in dndscv. Specific models can be required using the parameters of the dnds function.

Using custom genes lists

A custom list of genes can be supplied in the call to dnds as the variable genes_list; the package provides 4 lists of interests for this type of computation:

a list of driver genes compiled in Martincorena et al. Cell 171.5 (2017): 1029-1041.;
a list of driver genes compiled in Tarabichi, et al. Nature Genetics 50.12 (2018): 1630.;
a list of essential genes compiled in Wang et al. Science 350.6264 (2015): 1096-1101.;
a list of essential genes compiled in Bloomen et al. Science 350.6264 (2015): 1092-1096..

which are available to load.

# Load the list
data('cancer_genes_dnds', package = 'mobster')

# Each sublist is a list 
print(lapply(cancer_genes_dnds, head))

A custom gene list can be used as follows.

# Not run here
dnds_stats = dnds(
  clusters,
  mapping = c(`C1` = 'Non-tail', `C2` = 'Non-tail', `C3` = 'Non-tail', `Tail` = 'Tail'),
  gene_list = cancer_genes_dnds$Martincorena_drivers
)

Pooling data from multiple patients

The input format of the dnds function allows to pool data from several fits at once. We pool data from the 2 datasets available in the package.

# 2 lung samples
data('LU4_lung_sample', package = 'mobster')
data('LUFF76_lung_sample', package = 'mobster')

We pool the data selecting the required columns.

dnds_multi = dnds(
  rbind(
    Clusters(LU4_lung_sample$best) %>% select(chr, from, ref, alt, cluster) %>% mutate(sample = 'LU4'),
    Clusters(LUFF76_lung_sample$best) %>% select(chr, from, ref, alt, cluster) %>% mutate(sample = 'LUFF76')
  ),
  mapping = c(`C1` = 'Non-tail',  # Pool together all clonal mutations
              `Tail` = 'Tail'     # Pool together all tail mutations),
  )
)

print(dnds_multi$plot)