knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
Run mutational signature analysis software packages Packages and benchmarking the performance of these packages.
A package to 1. wrap R-based signature analysis packages in functions handy for non-expert users, by wrapping default argument values and all necessary steps in the function bodies. 2. reproduce benchmarking analysis of signature analysis packages in papers by Rozen Lab.
Typically, a benchmarking analysis to evaluation accuracy of signature extraction and/or exposure inference involves the 3 steps below:
SynSigGen
.
Usually, synthetic tumor exposures are drawn from a distribution which mimics the distribution of a real tumor type.
Run of computational approaches (can be an R/Python/Julia/C++ package) on generated data sets. It involves two steps:
For computational approaches based on R and can do signature extraction
which heuristically or semi-automatically selects K AND/OR
exposure inference (attribution),
we wrote wrapper functions in R/
folder of this package
for non-expert users to run these approaches with a simple function call.
SynSigEval
.Install the development version of SynSigRun
from GitHub
with the R command line:
install.packages("devtools") devtools::install_github("WuyangFF95/SynSigRun", ref = "1.0.0-branch")
Nature paper "The repertoire of mutational signatures in human cancer"
(link) involves benchmarking
analysis compared to SigProfiler
(the ancestor of SigProfilerExtractor
)
and SignatureAnalyzer
.
It used some functions and top-level codes in this package. Some of the codes
are in data-raw/Alexandrov_2020
.
Scientific Reports paper "Accuracy of mutational signature software on correlated signatures" involves benchmarking signature extraction accuracy of 18 methods on 20 synthetic datasets with correlated exposures to SBS1 and SBS5 signature.
In order to reproduce this benchmarking, users can go to
data-raw/Wu_2022/1_scripts.for.SBS1SBS5
to generate the main figure and the
full data of this analysis. The sub-folders hold scripts for:
1_data_generation
- Calls SynSigGen
generation script to generate 20
SBS1-SBS5 datasets at data-raw/
or other repositories.
2_running_approaches
- running computational approaches directly or using
SynSigRun
wrapper functions. The results are generated as a 5-level folder structure:
Level 1: Datasets (e.g. S.0.1.Rsq.0.1
);
Level 2: De-novo extraction without specifying K = 2
(ExtrAttr
), or
extraction with number of ground-truth signature K = 2
provided to computational
approaches (ExtrAttrExact
);
Level 3: Results of computational approaches (e.g. hdp.results
);
Level 4: Results of runs with seeds (e.g. seed.1
, run.1
).
3_evaluation
- evaluating performance of signature extraction by calling
evaluation functions in SynSigEval
.The paper for new computational approach mSigHdp
,
"mSigHdp: hierarchical Dirichlet processes in mutational signature extraction",
Liu et al. (2022) (Manuscript in revision) includes a benchmarking study
on real-tumor-based synthetic spectra with SBS or indel mutations.
The benchmarking code of this study calls the wrapper function in SynSigRun
to run computational approaches signeR
and SignatureAnalyzer
.
https://github.com/WuyangFF95/SynSigRun/blob/master/data-raw/SynSigRun_1.0.0.pdf
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.