CreateSBS1SBS5CorrelatedSyntheticDataOneDataset: Wrapper function for generating SBS1-SBS5-correlated...

View source: R/CreateSynSBS1SB5Correlated.R

CreateSBS1SBS5CorrelatedSyntheticDataOneDatasetR Documentation

Wrapper function for generating SBS1-SBS5-correlated Synthetic data

Description

This function will use SigProfiler-SBS96 mutational signatures to generate imaginary tumor spectra with mutation burdens only from SBS1 and SBS5, and mutation burdens of both signatures are highly correlated.

Usage

CreateSBS1SBS5CorrelatedSyntheticDataOneDataset(
  dir.name = "./S.0.5.Rsq.0.3",
  dataset.name = NULL,
  overwrite = FALSE,
  seed = 1,
  parameter.df = SynSigGen::SBS1SBS5parameter["S.0.5.Rsq.0.3", ],
  add.info = TRUE,
  verbose = FALSE
)

Arguments

dir.name

Folder to place the generated tumor spectra and other output files. Default: ./S.0.5.Rsq.0.3

dataset.name

The dataset.name encodes the parameters for the synthetic data, but this is just a convention. If NULL, it will be changed to the last part of the dir.name (Default: NULL)

overwrite

Whether to overwrite (Default: FALSE)

seed

The seed number used to initialize pseudo-random number generator (RNG). This makes the generation of the correlated datasets repeatable. (Default: 1)

parameter.df

a named 1*14 data.frame containing the following items:

  1. main.signature The name of the main signature whose exposure can vary freely. (Default: SBS5)

  2. correlated.signature The name of the correlated signature whose exposure is influenced by and co-varies with the exposure of main.signature. In this study, it defaults as "SBS1".

  3. name.prefix Default: TwoCorreSigsGen

  4. sample.number The number of synthetic tumors you want to generate. Default: 500

  5. main.mean.log The mean of log(count(SBS5),base = 10) Default: 2.5

  6. main.stdev.log The standard deviation of log(count(SBS5),base = 10) Default: 0.3

  7. correlated.stdev.log The ADDED standard deviation of log(count(SBS1),base = 10). This parameter is ADDED stdev because based on the mechanism to generate the count, log10(count(SBS1)) inherently has a stdev = slope * main.stdev.log Default: 0.4

  8. slope.linear The ratio for: (Correlated exposure) / (Main exposure) IN LINEAR SPACE! Default: 0.5

  9. main.signature.lower.thres This program will force the exposure count of main.signature to be greater than this threshold. Default: 100

  10. correlated.signature.lower.thres This program will force the exposure count of correlated.signature to be greater than this threshold. Default: 1

  11. pearson.r.2.lower.thres Lower boundary of Pearson's R^2 (Default: 0.29)

  12. pearson.r.2.higher.thres Upper boundary of Pearson's R^2 (Default: 0.31)

  13. min.main.to.correlated.ratio.linear The lower ratio for count(SBS5) / count(SBS1) in LINEAR SPACE! (Default: 1/3)

  14. max.main.to.correlated.ratio.linear The upper ratio for count(SBS5) / count(SBS1) in LINEAR SPACE! (Default: Inf)

add.info

Whether to generate additional information.

verbose

If TRUE cat progress messages. You should set it to FALSE when you want to make a diff using CreateSBS1SBS5CorrelatedSyntheticData (i.e. parameter regressdir is not NULL). This is because Additional information may differ on different OS or R sessions, thus may prevent the dataset from passing the NewDiff4SynDatasets check. (Default: TRUE)

Warning
Exposure generation function will repeat generating exposure counts using mean and stdev parameters, until the dataset has a Pearson's R^2 which falls between two boundaries of Pearson's R^2. Below are a group of parameters which have been tested successfully. If you intend to lower the Pearson's R^2, do remember to increase the main.stdev.log and correlated.stdev.log. Otherwise, the exposure generation will keep generating and discarding datasets!

Details

If you want to customize Pearson R^2 of the dataset, you need to change the standard deviations of two signatures. i.e., main.stdev.log and correlated.stdev.log.

This function will generate files listed below:

ground.truth.syn.catalog.csv: Generated tumor spectra in ICAMS SBS96 CSV format.

ground.truth.syn.exposures.csv: Mutation burdens of SBS1 and SBS5 in generated tumor spectra in ICAMS CSV format.

ground.truth.syn.sigs.csv: Ground-truth SBS1 and SBS5 signatures in ICAMS SBS96 CSV format.

parameters.txt: Parameters used to generate the exposures and tumor spectra.

scatterplot.pdf: scatterplot illustrating correlation of exposures of two signatures in generated spectra

seedInUse.txt, RNGInUse.txt: seed and Random Number Generator used in generation. (For better reproducibility)

sessionInfo.txt: information related to R versions, platforms, loaded or imported packages, etc. (For better reproducibility)


steverozen/SynSigGen documentation built on April 1, 2022, 8:54 p.m.