SynSigGen | R Documentation |
Create catalogs of synthetic mutational spectra for assessing the performance of mutational-signature analysis programs.
The main focus is generating synthetic catalogs of mutational spectra (mutations in tumors) based on known mutational signature profiles and software-inferred exposures (software's estimate on number of mutations induced by mutational signatures in tumors) in the PCAWG7 data. We call this kind of synthetic data broadly "reality-based" synthetic data. The package also has a set of functions that generate random mutational signature profiles and then create synthetic mutational spectra based on these random signature profiles. We call this kind of synthetic data "random" synthetic data, while pointing out that much depends on the distributions from which the random signature profiles and attributions are generated.
Typical workflow for generating synthetic mutational spectra is as follows.
Input (based on SignatureAnalyzer or SigProfiler analysis of PCAWG tumors)
E
, matrix of software-inferred exposures of mutational signatures (signatures x samples)
S
, mutational signature profiles (mutation types x signatures)
Obtain distribution parameters from software-inferred exposures
P <- GetSynSigParamsFromExposures(E, ...)
Generate exposures for synthetic mutational spectra based on P
synthetic.exposures <- GenerateSyntheticExposures(P, ...)
Generate synthetic mutational spectra by multiplying S
and synthetic.exposures
,
and round the product to the nearest unit:
synthetic.spectra <- CreateAndWriteCatalog(S, synthetic.exposures, ...)
The top-level function for generating "random" synthetic mutational spectra is
CreateRandomSyn
. It adopts the following steps to generate
catalogs of "random" synthetic mutational spectra.
Create random mutational signature profiles:
S <- CreateRandomMutSigProfiles(...)
Generate distribution parameters for exposures of random signatures:
P <- CreateMeanAndStdevForSigs(sig.names = colnames(S),...)
Create exposures for mutational signatures based on P
and other
parameters:
synthetic.exposures <- CreateRandomExposures(sigs = S, per.sig.mean.and.sd = P)
Generate synthetic mutational spectra by multiplying S
and synthetic.exposures
and round the product to the nearest unit:
synthetic.spectra <- NewCreateAndWriteCatalog(S, synthetic.exposures, ...)
CreateSBS1SBS5CorrelatedSyntheticData
is the top-level function for
generating 20 data sets which only have 2 active signatures (SBS1 and SBS5)
with positively-correlated exposures.
This function is used for generating synthetic mutational spectra used in paper "Performance of Mutational Signature Software on Correlated Signatures".
The repertoire of mutational signatures in human cancer (https://doi.org/10.1038/s41586-020-1943-3)
involves evaluation of performances on two computational approaches
(SigProfiler
and SignatureAnalyzer
) on 11 synthetic data sets
(Synapse ID: syn18497223).
Function PancAdenoCA1000
creates 1000 pancreatic adenocarcinoma
spectra data set (syn18500212).
Script
creates 2,700 synthetic spectra (syn18500213). This data set consists of 9 cancer types each with 300 synthetic tumors:
bladder transitional cell carcinoma,
oesophageal adenocarcinoma,
breast adenocarcinoma,
lung squamous cell carcinoma,
renal cell carcinoma,
ovarian adenocarcinoma,
osteosarcoma,
cervical adenocarcinoma and
stomach adenocarcinoma.
Function RCCOvary1000
creates spectra dataset consists of
500 synthetic kidney (RCC) with high prevalence and mutation load from
SBS5 and SBS40 signatures, and 500 synthetic ovarian adenocarcinoma with
high prevalence and mutation load from SBS3.
Notes:
Mutation loads from other mutational signatures (besides SBS3, SBS5, SBS30) also exist in the spectra dataset created by function RCCOvary1000;
SBS3, SBS5, SBS40 are flat signatures. This dataset challenges the computational approaches on accurately separating these 3 mutational signatures, as mixing SBS5 and SBS40 can get a mutational signature similar to SBS3.
Function Create.3.5.40.Abstract
creates 1000 synthetic spectra all constructed
entirely from SBS3, SBS5, and SBS40, using mutational loads modelled on kidney-RCC
(SBS5 and SBS40) and ovarian adenocarcinoma (SBS3). Most synthetic spectra have contributions
from all three signatures.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.