knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(mobster) library(tidyr) library(dplyr)
mobster
interfaces with the dndscv R package to compute dN/dS values from its output clusters. The method implemented in dndscv
is described in Martincorena, et al. "Universal patterns of selection in cancer and somatic tissues", Cell 171.5 (2017): 1029-1041; PMID 29056346).
Requirements. In order to be able to compute dN/dS values mutations data must store their genomic coordinates:
chrom
, from
, alt
and ref
. Besides, it is important to know what is the reference genome used to align the genome; this information will be used by dndscv
to annotate input mutations.
We show this analysis with the fits for one of the lung samples available in the package.
fit = mobster::LUFF76_lung_sample # Print and plot the model print(fit$best) plot(fit$best)
We compute the values using the clustering assignments from the best fit.
clusters = Clusters(fit$best) print(clusters)
The available clusters are C1
and Tail
; C1
is the clonal cluster. We compute dN/dS with the default parameters.
# Run by cluster and default gene list dnds_stats = dnds( clusters, gene_list = NULL )
The statistics can be computed for a custom grouping of the clusters. Here it does not make much difference because we have only the clonal cluster, and the tail; but if we had one subclone C2
we could have pooled together the mutations in the clones using
# Not run here dnds_stats = dnds( clusters, mapping = c(`C1` = 'Non-tail', `C2` = 'Non-tail', `Tail` = 'Tail'), gene_list = NULL )
In the above analysis we have run dndscv
using the default gene list (gene_list = NULL
). Notice that errors raised by dndscv
are intercepted by mobster
; some of this errors might originate from a dataset with not enough substitutions to compute dN/dS.
The call returns:
dndscv
, where column dnds_group
labels the group.ggplot
plot of the point estimates and the confidence interval; # Summary statistics print(dnds_stats$dnds_summary) # Table observation countns print(dnds_stats$dndscv_table) # Plot print(dnds_stats$plot)
The default plot contains results obtained from all substitution models available in dndscv
. Specific models can be required using the parameters of the dnds
function.
A custom list of genes can be supplied in the call to dnds
as the variable genes_list
; the package provides 4 lists of interests for this type of computation:
Martincorena et al. Cell 171.5 (2017): 1029-1041.
;Tarabichi, et al. Nature Genetics 50.12 (2018): 1630.
;Wang et al. Science 350.6264 (2015): 1096-1101.
;Bloomen et al. Science 350.6264 (2015): 1092-1096.
.which are available to load.
# Load the list data('cancer_genes_dnds', package = 'mobster') # Each sublist is a list print(lapply(cancer_genes_dnds, head))
A custom gene list can be used as follows.
# Not run here dnds_stats = dnds( clusters, mapping = c(`C1` = 'Non-tail', `C2` = 'Non-tail', `C3` = 'Non-tail', `Tail` = 'Tail'), gene_list = cancer_genes_dnds$Martincorena_drivers )
The input format of the dnds
function allows to pool data from several fits at once. We pool data from the 2 datasets available in the package.
# 2 lung samples data('LU4_lung_sample', package = 'mobster') data('LUFF76_lung_sample', package = 'mobster')
We pool the data selecting the required columns.
dnds_multi = dnds( rbind( Clusters(LU4_lung_sample$best) %>% select(chr, from, ref, alt, cluster) %>% mutate(sample = 'LU4'), Clusters(LUFF76_lung_sample$best) %>% select(chr, from, ref, alt, cluster) %>% mutate(sample = 'LUFF76') ), mapping = c(`C1` = 'Non-tail', # Pool together all clonal mutations `Tail` = 'Tail' # Pool together all tail mutations), ) ) print(dnds_multi$plot)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.