knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(mSigHdp) library(ICAMS) library(cosmicsig)
mSigHdp uses hierarchical Dirichlet process mixture modeling to discover mutational signatures. For real data this is very compute intensive, so here we will show only a toy example with tumors and the typical classification of single-base-substitution (SBS) mutations in 96 types. We generate the synthetic spectra based on two mutational signatures, SBS1 and SBS22. We get the signatures from CRAN package cosmicsig, which provides signatures from https://cancer.sanger.ac.uk/signatures/. We set a specific seed so that the synthetic spectra are the same every time we run the vignette.
sigs <- cosmicsig::COSMIC_v3.2$signature$GRCh37$SBS96[ , c("SBS1", "SBS22")] set.seed(2022)
We generate 10 random exposures and 2 highly-enriched exposures for each signature (so we can discover them quickly), and make them into an exposure matrix.
n.tumor <- 10 exposures1 <- c(rnbinom(n.tumor, mu = 2000, size = 5), 10000, 100) exposures2 <- c(rnbinom(n.tumor, mu = 10000, size = 1000), 100, 5000) exposures <- rbind(exposures1, exposures2)
Then we generate spectra by multiplying the signatures times the exposures and rounding, because we need integer mutation counts. We convert the matrix into an ICAMS catalog so we can use custom plotting later.
toy_data <- round(sigs %*% exposures) colnames(toy_data) <- paste("T", 1:(n.tumor + 2), sep="") toy_data <- ICAMS::as.catalog(toy_data, catalog.type = "counts")
Here are the first few and last few rows of the 12 spectra. Each column is one of the 12 synthetic tumors. Each row is a mutation type, for example in in row 5 the row name CCAA indicates a mutation from CCA to CAA.
knitr::kable(toy_data[1:5, ]) knitr::kable(toy_data[91:96, ])
Most people will want to use the RunHdxParallel
function.
Most of the parameters in the example here are not suitable for real data.
Please see our paper to be posted on bioRxiv soon for suggestions
for actual values to use.
results <- mSigHdp::RunHdpxParallel( input.catalog = toy_data, out.dir = "vignette_output", num.child.process = 4, CPU.cores = 2, seedNumber = 123, K.guess = 5, burnin = 1000, burnin.multiplier = 2, post.n = 5, post.space = 10, multi.types = FALSE, overwrite = TRUE, gamma.alpha = 1, gamma.beta = 20, high.confidence.prop = 0.9, checkpoint = TRUE, verbose = FALSE)
The results are in both the variable results
and the
directory vignette_output
,
which also contains plots of the output and diagnostic plots.
It is convenient to automatically save the results to files
since the program will likely have been running for days, and you will
likely want to save the outputs for later
examination in any case.
The return value of RunHdpxParallel
will be included in the
directory in binary form as hdp.retval.Rdata
.
The directory vignette_output
contains a plot of the extracted (discovered) signatures
in file extracted.signatures.pdf
.
knitr::include_graphics("vignette_output/extracted.signatures.pdf")
The signatures in numerical form are in file extracted.signatures.csv
.
extracted.sigs <- ICAMS::ReadCatalog("vignette_output/extracted.signatures.csv") knitr::kable(extracted.sigs[1:10, ])
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.