GenerateSyntheticTumors: Generate synthetic tumors based on real exposures in one or...

View source: R/GenerateSyntheticTumors.R

GenerateSyntheticTumorsR Documentation

Generate synthetic tumors based on real exposures in one or more cancer types

Description

Generate synthetic tumors based on real exposures in one or more cancer types

Usage

GenerateSyntheticTumors(
  seed,
  dir,
  cancer.types,
  samples.per.cancer.type,
  input.sigs,
  real.exposures,
  distribution = NULL,
  sample.prefix.name = "SP.Syn.",
  tumor.marker.name = NULL,
  overwrite = TRUE,
  verbose = 0,
  sig.params = NULL
)

Arguments

seed

A random seed to use.

dir

The directory in which to put the output; will be created if necessary.

cancer.types

A vector of character strings denoting different cancer types. This function will search real.exposures for exposures from tumors matching these strings. See PCAWG7::CancerTypes() for example.

samples.per.cancer.type

Number of synthetic tumors to create for each cancer type. If it is one number, then generate the same number of synthetic tumors for each cancer.types. Or if it is a vector of numbers, then generate synthetic tumors for each cancer.type accordingly to the number specified in the vector. The length and order of samples.per.cancer.type should match that in cancer.types.

input.sigs

A matrix of signatures.

real.exposures

A matrix of real exposures.

distribution

Probability distribution used to generate synthetic exposures due to active mutational signatures. Can be neg.binom which stands for negative binomial distribution. If NULL (Default), then this function uses log normal distribution with base 10.

sample.prefix.name

Prefix name to add to the synthetic tumors.

tumor.marker.name

Tumor marker name to add to the synthetic tumors. E.g. "MSI-H", "POLE".

overwrite

If TRUE, overwrite existing directories and files.

verbose

If > 0 cat various messages.

sig.params

Empirical signature parameters generated using real exposures irrespective of their cancer types. If there is only one tumor having a signature in a cancer type in real.exposures, we cannot fit the distribution to only one data point. Instead, we will use the empirical parameter size from sig.params. Users can use SynSigGen:::GetSynSigParamsFromExposuresOld to generate their own signature parameters. If NULL(default), this function uses the PCAWG7 empirical signature parameters. See signature.params for more details.

Value

A list of three elements that comprise the synthetic data:

  1. ground.truth.catalog: Spectra catalog with rows denoting mutation types and columns denoting sample names.

  2. ground.truth.signatures: Signatures active in ground.truth.catalog.

  3. ground.truth.exposures: Exposures of ground.truth.signatures in ground.truth.catalog.

Examples


# Generate synthetic tumors for DBS78
input.sigs.DBS78 <- cosmicsig::COSMIC_v3.2$signature$GRCh37$DBS78
real.exposures.DBS78 <- PCAWG7::exposure$PCAWG$DBS78
cancer.types <- PCAWG7::CancerTypes()[1:5]
DBS78.synthetic.tumors <-
  GenerateSyntheticTumors(seed = 191906,
                          dir = file.path(tempdir(), "DBS78.synthetic.tumors"),
                          cancer.types = cancer.types,
                          samples.per.cancer.type = 30,
                          input.sigs = input.sigs.DBS78,
                          real.exposures = real.exposures.DBS78,
                          sample.prefix.name = "SP.Syn."
  )

# Generate synthetic tumors for Indel (ID) using negative binomial distribution
input.sigs.ID <- cosmicsig::COSMIC_v3.2$signature$GRCh37$ID
real.exposures.ID <- PCAWG7::exposure$PCAWG$ID
cancer.types <- PCAWG7::CancerTypes()[1:5]
ID.synthetic.tumors <-
  GenerateSyntheticTumors(seed = 191906,
                          dir = file.path(tempdir(), "ID.synthetic.tumors"),
                          cancer.types = cancer.types,
                          samples.per.cancer.type = 30,
                          input.sigs = input.sigs.ID,
                          real.exposures = real.exposures.ID,
                          distribution = "neg.binom",
                          sample.prefix.name = "SP.Syn."
  )

# Plot the synthetic catalog and exposures
ICAMS::PlotCatalogToPdf(catalog = DBS78.synthetic.tumors$ground.truth.catalog,
                        file = file.path(tempdir(), "DBS78.synthetic.catalog.pdf"))
mSigAct::PlotExposureToPdf(exposure = DBS78.synthetic.tumors$ground.truth.exposures,
                           file = file.path(tempdir(), "DBS78.synthetic.exposures.pdf"),
                           cex.xaxis = 0.7)

steverozen/SynSigGen documentation built on April 1, 2022, 8:54 p.m.