Getting Started with mutSignatures

set.seed(1234567)
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE, fig.align = "center", results = "asis")

my.t0 <- Sys.time()

library(dplyr)
library(reshape2)
library(kableExtra)
library(ggplot2)
library(gridExtra)
library(mutSignatures)

# load data
data("mutSigData")

# Extra cols
head.muty <- c("T[C>A]G", "T[C>T]A", "G[C>T]A", "C[C>T]T", "T[C>G]T", "T[C>G]C", 
               "T[C>G]A", "T[C>T]T", "T[C>G]T", "T[C>T]G", "T[C>A]G", "T[C>T]A", 
               "A[T>C]T", "A[T>G]G", "C[T>C]T", "T[C>T]A", "T[C>T]A", "G[T>A]C")

Cancer cells accumulate DNA mutations as result of DNA damage and DNA repair processes. mutSignatures is a computational framework aimed at deciphering DNA mutational signatures operating in cancer.

Tutorials and How-tos

More vignettes discussing how to extract and analyze mutational signatures from different sources using mutSignatures can be found at the following URLs:

Installation

The package is available from CRAN. The official version of mutSignatures (version 2.1.1) can be installed from any CRAN mirror.

install.packages("mutSignatures")

The latest (most recently updated, beta) version of the package is available on GitHub. It is possible to install mutSignatures using devtools.

devtools::install_github("dami82/mutSignatures", force = TRUE, 
                         build_opts = NULL, build_vignettes = TRUE)

Note on software Dependencies

One of the pre-processing steps in the mutSignatures pipeline relies on a list of Bioconductor (https://www.bioconductor.org/) libraries. These libraries are optional, since they are only required for the attachContext() step (these libs are not automatically downloaded when installing the package). The bioconductor libraries are: i) IRanges; ii) GenomicRanges; iii) BSgenome; iv) GenomeInfoDb. You can install them as shown below.

# Get BiocManager
if (!"BiocManager" %in% rownames(utils::installed.packages()))
  install.packages("BiocManager")

# Get Bioc libs
BiocManager::install(pkgs = c("IRanges", "GenomicRanges", "BSgenome", "GenomeInfoDb"))

To complete the attachContext() step, a BSgenome package including full genome sequences of the organism of interest has to be installed as well. For example, human hg19 genome sequences can be installed as shown below.

BiocManager::install(pkgs = "BSgenome.Hsapiens.UCSC.hg19")

Pipeline

The typical mutSignatures analytic pipeline includes three steps:

Alternative pipeline

If the set of mutational signatures is already known, it is possible to use mutSignatures to estimate the contribution of each mutational pattern in a collection of samples. In this case, the pipeline includes the following steps:

Load library, set-up environment

For running the demos described in this vignette, the following R libraries will be used: dplyr, reshape2, ggplot2, kableExtra, BSgenome.Hsapiens.UCSC.hg19, and mutSignatures. Before proceeding, we assign the hg19 BSgenome object to a variable named hg19, and we load the mutSigData dataset, which is provided together with the mutSignatures package.

# Required libs
library(dplyr)
library(reshape2)
library(kableExtra)
library(ggplot2)
library(gridExtra)
library(BSgenome.Hsapiens.UCSC.hg19)

# Load mutSignatures
library(mutSignatures)

# prep hg19
hg19 <- BSgenome.Hsapiens.UCSC.hg19

# load data
data("mutSigData")

Example 1 - De novo extraction of Mutational Signatures from BLCA samples

The first demo shows how to extract mutational signatures from a dataset including DNA mutations from bladder cancer samples.

Data Import and pre-processing

Here, the input has a VCF-like structure, and is decorated with an extra column (namely, SAMPLEID) that includes a unique identifier for the biological samples.

# Import data (VCF-like format)
x <- mutSigData$inputC

# Filter non SNV
x <- filterSNV(dataSet = x,  seq_colNames = c("REF", "ALT"))

# Visualize head
head(x) %>% kable() %>% kable_styling(bootstrap_options = "striped")
# Attach context
x <- attachContext(mutData = x,
                   chr_colName = "CHROM",
                   start_colName = "POS",
                   end_colName = "POS",
                   nucl_contextN = 3,
                   BSGenomeDb = hg19)
x <- mutSignatures::mutSigData$inputC.ctx
# Visualize head
head(x) %>% kable() %>% kable_styling(bootstrap_options = "striped")
# Remove mismatches
x <- removeMismatchMut(mutData = x,                  # input data.frame
                       refMut_colName = "REF",       # column name for ref base
                       context_colName = "context",  # column name for context
                       refMut_format = "N")    
# Compute mutType
x <- attachMutType(mutData = x,                      # as above
                   ref_colName = "REF",              # column name for ref base
                   var_colName = "ALT",              # column name for mut base
                   context_colName = "context") 
x <- x[1:18, ]
x$mutType <- head.muty
# Visualize head
head(x) %>% kable() %>% kable_styling(bootstrap_options = "striped")
# Count
blca.counts <- countMutTypes(mutTable = x,
                             mutType_colName = "mutType",
                             sample_colName = "SAMPLEID")
# Count
blca.counts <- mutSignatures::as.mutation.counts(mutSigData$blcaMUTS)
# Mutation Counts
print(blca.counts)

Signature Extraction

After multi-nucleotide mutation count data have been prepared, it is possible to proceed with the de-novo signature extraction. Settings guiding the analysis are defined as a list of parameters that can be tuned via the setMutClusterParams() function. Next, the NMF analysis is executed by the decipherMutationalProcesses() function. At the end of the analysis, a silhouette plot is returned. This plot can be very helpful to understand if the obtained signatures are relatively weak or robust.

# how many signatures should we extract? 
num.sign <- 4

# Define parameters for the non-negative matrix factorization procedure.
# you should parallelize if possible
blca.params <- 
  mutSignatures::setMutClusterParams( 
    num_processesToExtract = num.sign,    # num signatures to extract
    num_totIterations = 20,               # bootstrapping: usually 500-1000
    num_parallelCores = 4)                # total num of cores to use (parallelization)
# Extract new signatures - may take a while
blca.analysis <- 
  decipherMutationalProcesses(input = blca.counts,
                              params = blca.params)
blca.analysis <- list(Results = mutSigData$inputS$Results)
tmp <- mutSigData$inputS$silhouetteTMP

xrange <- c(min(tmp[,3]), max(tmp[,3]))
xrange[1] <- ifelse(xrange[1] > 0, 0, (-1.15) * abs(xrange[1]))
xrange[2] <- 1.15
graphics::barplot(tmp[nrow(tmp):1, 3], 
                  col = base::as.factor(tmp[nrow(tmp):1,1]),
                  xlim = xrange,
                  horiz = TRUE, xlab = "Silhouette Value", 
                  ylab = "", 
                  main = "Silhouette Plot", 
                  border = base::as.factor(tmp[nrow(tmp):1,1]))
graphics::abline(v=0)
graphics::title(ylab="Iter. Results (by Group)", line=1, cex.lab=1, font = 2)

Downstream analyses and visualization

Profiles of de-novo extracted mutational signatures can be visualized (by barplots). Also it is possible to plot the estimated exposures of each sample to the identified mutational patterns. It is also possible to compare de-novo extracted signatures with known signatures (such as signatures from previous analyses, or COSMIC signatures). To cast mutSignature objects as data.frames, you can use the mutSignatures::as.data.frame() method.

Note: you can use the mutSignatures::getCosmicSignatures() function to download a copy of the COSMIC signatures.

# Retrieve signatures (results)
blca.sig <- blca.analysis$Results$signatures

# Retrieve exposures (results)
blca.exp <- blca.analysis$Results$exposures

# Plot signature 1 (standard barplot, you can pass extra args such as ylim)
msigPlot(blca.sig, signature = 1, ylim = c(0, 0.10))
# Plot exposures (ggplot2 object, you can customize as any other ggplot2 object)
msigPlot(blca.exp, main = "BLCA samples") + 
  scale_fill_manual(values = c("#1f78b4", "#cab2d6", "#ff7f00", "#a6cee3"))
# Export Signatures as data.frame
xprt <- coerceObj(x = blca.sig, to = "data.frame") 
head(xprt) %>% kable() %>% kable_styling(bootstrap_options = "striped")
# Get signatures from data (imported as data.frame) 
# and then convert it to mutSignatures object
cosmixSigs <- mutSigData$blcaSIGS %>% 
  dplyr::select(starts_with("COSMIC")) %>% 
  as.mutation.signatures()

blcaKnwnSigs <- mutSigData$blcaSIGS %>% 
  dplyr::select(starts_with("BLCA")) %>% 
  as.mutation.signatures()

# Compare de-novo signatures with selected COSMIC signatures
msig1 <- matchSignatures(mutSign = blca.sig, reference = cosmixSigs, 
                         threshold = 0.45, plot = TRUE) 
msig2 <- matchSignatures(mutSign = blca.sig, reference = blcaKnwnSigs, 
                         threshold = 0.45, plot = TRUE)
# Visualize match
# signature 1 is similar to COSMIC ; 
# signatures 2 and 3 are similar to COSMIC
# Here, we should probably extract only 2 mutational signatures
hm1 <- msig1$plot + ggtitle("Match to COSMIC signs.")
hm2 <- msig2$plot + ggtitle("Match to known BLCA signs.")

# Show
grid.arrange(hm1, hm2, ncol = 2)

Example 2 - Estimate the contribution of a panel of known mutational signatures

The second demo illustrates a case where the signatures are known. We will use the COSMIC signatures 1, 2, 5, and 13 in the analysis, as well as a set of custom signatures previously identified in bladder cancer tumor samples. We will estimate their contributions to a collection of blca mutations that are included in the mutSigData dataset.

Data Import and pre-processing

Data import and pre-processing can be conducted as shown above. Note that if a data.frame with counts of mutation types across samples is available, it is sufficient to use the as.mutation.counts() method to convert it to the desired object.

# Retrieve a mutation.counts data.frame
x <- mutSigData$blcaMUTS

# Visualize header
x[1:10, 1:5] %>% kable() %>% kable_styling(bootstrap_options = "striped")
# Convert it
xx <- as.mutation.counts(x)
# Print to screen
print(xx)

Estimate Exposures to known Mutational Signatures

After preparing the mutational signatures as a mutationSignatures-class object, it is sufficient to run the resolveMutSignatures() function. It is not recommended to use a large number of signatures when running this analysis (for example, refrain from using all COSMIC signatures). Likewise, you don't want to run this step using a very low number of signatures. The algorithm is way more accurate if ONLY AND ALL the relevant signatures are used (i.e., the signatures we are reasonably expecting to be operative in the tumor samples being analyzed). This analysis is typically very fast!

# Obtain 4 COSMIC signatures
cosmx <- mutSigData$blcaSIGS %>% dplyr::select(starts_with("COSMIC"))
cosmx <- as.mutation.signatures(cosmx)

# Obtain 4 BLCA signatures
blcmx <- mutSigData$blcaSIGS %>% dplyr::select(starts_with("BLCA"))
blcmx <- as.mutation.signatures(blcmx)
# Visualize cosmx
print(cosmx)
# Visualize cosmx
print(blcmx)
# Run analysis
blca.expo1 <- resolveMutSignatures(mutCountData = xx, 
                                   signFreqData = cosmx)

blca.expo2 <- resolveMutSignatures(mutCountData = xx, 
                                   signFreqData = blcmx)

Downstream analysis

As discussed above, exposures to mutational signatures can be plotted (barplots), and results can be exported using the mutSignatures::data.frame method.

# Retrieve exposures (results)
blca.exp.1x <- blca.expo1$results$count.result
blca.exp.2x <- blca.expo2$results$count.result

# Plot exposures
bp1 <- msigPlot(blca.exp.1x, main = "BLCA | COSMIC sigs.") + 
  scale_fill_manual(values = c("#fdbf6f", "#e31a1c", "#fb9a99", "#1f78b4"))

bp2 <- msigPlot(blca.exp.2x, main = "BLCA | pre-blca sigs.") + 
  scale_fill_manual(values = c("#fdbf6f", "#e31a1c", "#fb9a99", "#1f78b4"))

# Visualize
grid.arrange(bp1, bp2, ncol = 2)

# Compare sigs

# Export Exposures as data.frame
xprt <- as.data.frame(blca.exp.1x, transpose = TRUE)
head(xprt) %>% round() %>% kable() %>% 
  kable_styling(bootstrap_options = "striped")

More info and examples

More info and other examples can be found at the following URLs:

SessionInfo

sessionInfo()

Thanks for your interest in our software. If you run into any issue, bug, or you have a question about mutSignatures, feel free to email damiano.fantini@gmail.com. At this time, mutSignatures is mainly supported by DF in his free time (so, please, be patient upon your inquires). C-2020 (November-01, 2020).



Try the mutSignatures package in your browser

Any scripts or data that you put into this service are public.

mutSignatures documentation built on Nov. 9, 2020, 9:06 a.m.