Synthetic (Mutational) Signature Generation (SynSigGen
)
Create catalogs of synthetic mutational spectra for assessing the performance of mutational signature analysis programs. ‘SynSigGen’ stands for Generation of Synthetic Signatures and Spectra.
Before installation, prerequisites in Bioconductor needs to be installed:
install.packages("BiocManager")
BiocManager::install(
c("Biostrings", "BSgenome", "GenomeInfoDb", "GenomicRanges")
)
Install from GitHub with the R command line:
install.packages("remotes")
remotes::install_github(repo = "steverozen/SynSigGen", ref = "1.1.1-branch")
Use functions below to generate 11 spectra datasets used in paper The repertoire of mutational signatures in human cancer (https://doi.org/10.1038/s41586-020-1943-3), published in Nature. The data sets are available at Synapse:
# Users should specify regress.dir = NULL unless for comparison
# with original data sets.
#
# Compare tools (e.g., BeyondCompare, Meld) are recommended
# over specifying regress.dir,
# because the latter might raise an error even when query
# and original data sets are identical.
#
# Users should specify top.level.dir to the destination folder
# for data sets. Otherwise default paths will be used.
PancAdenoCA1000()
ManyTypes2700()
RCCOvary1000()
Create.3.5.40.Abstract()
BladderSkin1000()
Create.2.7a.7b.Abstract()
CreateRandomSyn()
The description of 11 data sets are available at section “Description of each suite of synthetic data sets” in Supplementary Note 2 of the paper.
To generate 20 spectra data sets with mutation load of and correlation between SBS1 and SBS5 varied, use function
CreateSBS1SBS5CorrelatedSyntheticData()
This paper is published at Scientific Reports, and the original data sets are available at Zenodo.
To generate 3 spectra data sets on single base substitution (SBS) mutation channels and 3 spectra datasets on indel channels, check GitHub repository Liu_et_al_Sup_Files. The dataset generation codes in this repository requires SynSigGen >= 1.1.1 as a dependency.
The wrapper functions used to generate data sets in Nature paper are in R files with suffix “_Nat”.
The wrapper functions used to generate data sets for paper on 20 correlated data sets are in file “CreateSynSBS1SBS5Correlated.R”.
These wrapper functions are primarily used to generate legacy data sets, as they don’t round the exposures to integers.
By contrast, GenerateSyntheticExposures()
now rounds the exposures
by default from version 1.0.10.
https://github.com/steverozen/SynSigGen/blob/1.1.1-branch/data-raw/SynSigGen_1.1.1.pdf
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.