Run mutational signature analysis software packages Packages and benchmarking the performance of these packages.
A package to 1. wrap R-based signature analysis packages in functions handy for non-expert users, by wrapping default argument values and all necessary steps in the function bodies. 2. reproduce benchmarking analysis of signature analysis packages in papers by Rozen Lab.
Typically, a benchmarking analysis to evaluation accuracy of signature extraction and/or exposure inference involves the 3 steps below:
Generation of synthetic tumor spectra based on signatures and
synthetic tumor exposures using wrapper functions in
SynSigGen. Usually,
Run of computational approaches (can be an R/Python/Julia/C++ package) on generated data sets. It involves two steps:
For computational approaches based on R and can do signature
extraction which heuristically or semi-automatically selects K
AND/OR exposure inference (attribution), we wrote wrapper functions
in R/ folder of this package for non-expert users to run these
approaches with a simple function call.
Evaluation of accuracy on signature extraction AND/OR exposure
inference. Many of the evaluation functions are in package
SynSigEval.
Install the development version of SynSigRun from
GitHub with the R command
line:
install.packages("devtools")
devtools::install_github("WuyangFF95/SynSigRun", ref = "1.0.0-branch")
Nature paper “The repertoire of mutational signatures in human cancer”
(link) involves
benchmarking analysis compared to
SigProfiler
(the ancestor of
SigProfilerExtractor)
and SignatureAnalyzer.
It used some functions and top-level codes in this package. Some of the
codes are in data-raw/Alexandrov_2020.
Scientific Reports paper “Accuracy of mutational signature software on correlated signatures” involves benchmarking signature extraction accuracy of 18 methods on 20 synthetic datasets with correlated exposures to SBS1 and SBS5 signature.
In order to reproduce this benchmarking, users can go to
data-raw/Wu_2022/1_scripts.for.SBS1SBS5 to generate the main figure
and the full data of this analysis. The sub-folders hold scripts for:
1_data_generation - Calls SynSigGen generation script to
generate 20 SBS1-SBS5 datasets at data-raw/ or other repositories.
2_running_approaches - running computational approaches directly
or using SynSigRun wrapper functions. The results are generated as
a 5-level folder structure:
Level 1: Datasets (e.g. S.0.1.Rsq.0.1);
Level 2: De-novo extraction without specifying K = 2 (ExtrAttr), or
extraction with number of ground-truth signature K = 2 provided to
computational approaches (ExtrAttrExact);
Level 3: Results of computational approaches (e.g. hdp.results);
Level 4: Results of runs with seeds (e.g. seed.1, run.1).
3_evaluation - evaluating performance of signature extraction by
calling evaluation functions in SynSigEval.The paper for new computational approach mSigHdp, “mSigHdp:
hierarchical Dirichlet processes in mutational signature extraction”,
Liu et al. (2022) (Manuscript in revision) includes a benchmarking study
on real-tumor-based synthetic spectra with SBS or indel mutations.
The benchmarking code of this study calls the wrapper function in
SynSigRun to run computational approaches
signeR
and SignatureAnalyzer.
https://github.com/WuyangFF95/SynSigRun/blob/master/data-raw/SynSigRun_1.0.0.pdf
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.