TransformCatalog: Transform between counts and density spectrum catalogs and...

View source: R/utility_functions.R

TransformCatalogR Documentation

Transform between counts and density spectrum catalogs and counts and density signature catalogs

Description

Transform between counts and density spectrum catalogs and counts and density signature catalogs

Usage

TransformCatalog(
  catalog,
  target.ref.genome = NULL,
  target.region = NULL,
  target.catalog.type = NULL,
  target.abundance = NULL
)

Arguments

catalog

An SBS or DBS catalog as described in ICAMS; must not be an ID (small insertion and deletion) catalog.

target.ref.genome

A ref.genome argument as described in ICAMS. If NULL, then defaults to the ref.genome attribute of catalog.

target.region

A region argument; see as.catalog and ICAMS. If NULL, then defaults to the region attribute of catalog.

target.catalog.type

A character string acting as a catalog type identifier, one of "counts", "density", "counts.signature", "density.signature"; see as.catalog. If NULL, then defaults to the catalog.type attribute of catalog.

target.abundance

A vector of counts, one for each source K-mer for mutations (e.g. for strand-agnostic single nucleotide substitutions in trinucleotide – i.e. 3-mer – context, one count each for ACA, ACC, ACG, ... TTT). See all.abundance. If NULL, the function tries to infer target.abundace from the class of catalog and the value of the target.ref.genome, target.region, and target.catalog.type. If the target.abundance can be inferred and is different from a supplied non-NULL value of target.abundance, raise an error.

Details

Only the following transformations are legal:

  1. counts -> counts (deprecated, generates a warning; we strongly suggest that you work with densities if comparing spectra or signatures generated from data with different underlying abundances.)

  2. counts -> density

  3. counts -> (counts.signature, density.signature)

  4. density -> counts (the semantics are to infer the genome-wide or exome-wide counts based on the densities)

  5. density -> density (a null operation, generates a warning)

  6. density -> (counts.signature, density.signature)

  7. counts.signature -> counts.signature (used to transform between the source abundance and target.abundance)

  8. counts.signature -> density.signature

  9. counts.signature -> (counts, density) (generates an error)

  10. density.signature -> density.signature (a null operation, generates a warning)

  11. density.signature -> counts.signature

  12. density.signature -> (counts, density) (generates an error)

Value

A catalog as defined in ICAMS.

Rationale

The TransformCatalog function transforms catalogs of mutational spectra or signatures to account for differing abundances of the source sequence of the mutations in the genome.

For example, mutations from ACG are much rarer in the human genome than mutations from ACC simply because CG dinucleotides are rare in the genome. Consequently, there are two possible representations of mutational spectra or signatures. One representation is based on mutation counts as observed in a given genome or exome, and this approach is widely used, as, for example, at https://cancer.sanger.ac.uk/cosmic/signatures, which presents signatures based on observed mutation counts in the human genome. We call these "counts-based spectra" or "counts-based signatures".

Alternatively, mutational spectra or signatures can be represented as mutations per source sequence, for example the number of ACT > AGT mutations occurring at all ACT 3-mers in a genome. We call these "density-based spectra" or "density-based signatures".

This function can also transform spectra based on observed genome-wide counts to "density"-based catalogs. In density-based catalogs mutations are expressed as mutations per source sequences. For example, a density-based catalog represents the proportion of ACCs mutated to ATCs, the proportion of ACGs mutated to ATGs, etc. This is different from counts-based mutational spectra catalogs, which contain the number of ACC > ATC mutations, the number of ACG > ATG mutations, etc.

This function can also transform observed-count based spectra or signatures from genome to exome based counts, or between different species (since the abundances of source sequences vary between genome and exome and between species).

Examples

file <- system.file("extdata",
                    "strelka.regress.cat.sbs.96.csv",
                    package = "ICAMS")
if (requireNamespace("BSgenome.Hsapiens.1000genomes.hs37d5", quietly = TRUE)) {
  catSBS96.counts <- ReadCatalog(file, ref.genome = "hg19", 
                                 region = "genome",
                                 catalog.type = "counts")
  catSBS96.density <- TransformCatalog(catSBS96.counts,
                                       target.ref.genome = "hg19",
                                       target.region = "genome",
                                       target.catalog.type = "density")}

ICAMS documentation built on June 22, 2024, 6:47 p.m.