ICAMS: ICAMS: In-depth Characterization and Analysis of Mutational...

ICAMSR Documentation

ICAMS: In-depth Characterization and Analysis of Mutational Signatures

Description

Analysis and visualization of experimentally elucidated mutational signatures – the kind of analysis and visualization in Boot et al., "In-depth characterization of the cisplatin mutational signature in human cell lines and in esophageal and liver tumors",
Genome Research 2018 https://doi.org/10.1101/gr.230219.117 and "Characterization of colibactin-associated mutational signature in an Asian oral squamous cell carcinoma and in other mucosal tumor types", Genome Research 2020, https://doi.org/10.1101/gr.255620.119. "ICAMS" stands for In-depth Characterization and Analysis of Mutational Signatures. "ICAMS" has functions to read in variant call files (VCFs) and to collate the corresponding catalogs of mutational spectra and to analyze and plot catalogs of mutational spectra and signatures.

Details

"ICAMS" can read in VCFs generated by Strelka, Mutect or other variant callers, and collate the mutations into "catalogs" of mutational spectra. "ICAMS" can create and plot catalogs of mutational spectra or signatures for single base substitutions (SBS), doublet base substitutions (DBS), and small insertions and deletions (ID). It can also read and write these catalogs.

Catalogs

A key data type in "ICAMS" is a "catalog" of mutation counts, of mutation densities (see below), or of mutational signatures.

Catalogs are S3 objects of class matrix and one of several additional classes that specify the types of the mutations represented in the catalog. The additional class is one of

  • SBS96Catalog (strand-agnostic single base substitutions in trinucleotide context)

  • SBS192Catalog (transcription-stranded single-base substitutions in trinucleotide context)

  • SBS1536Catalog

  • DBS78Catalog

  • DBS144Catalog

  • DBS136Catalog

  • IndelCatalog

  • ID166Catalog (genic-intergenic indel catalog)

as.catalog is the main constructor.

Conceptually, a catalog also has one of the following types, indicated by the attribute catalog.type:

  1. Matrix of mutation counts (one column per sample), representing (counts-based) mutational spectra (catalog.type = "counts").

  2. Matrix of mutation **densities**, i.e. mutations per occurrences of source sequences (one column per sample), representing (density-based) mutational spectra (catalog.type = "density").

  3. Matrix of mutational signatures, which are similar to spectra. However where spectra consist of counts or densities of mutations in each mutation class (e.g. ACA > AAA, ACA > AGA, ACA > ATA, ACC > AAC, ...), signatures consist of the proportions of mutations in each class (with all the proportions summing to 1). A mutational signature can be based on either:

    • mutation counts (a "counts-based mutational signature", catalog.type = "counts.signature"), or

    • mutation densities (a "density-based mutational signature", catalog.type = "density.signature").

Catalogs also have the attribute abundance, which contains the counts of different source sequences for mutations. For example, for SBSs in trinucleotide context, the abundances would be the counts of each trinucleotide in the human genome, exome, or in the transcribed region of the genome. See TransformCatalog for more information. Abundances logically depend on the species in question and on the part of the genome being analyzed.

In "ICAMS" abundances can sometimes be inferred from the catalog class attribute and the function arguments region, ref.genome, and catalog.type. Otherwise abundances can be provided as an abundance argument. See all.abundance for examples.

Possible values for region are the strings genome, transcript, exome, and unknown; transcript includes entire transcribed regions, i.e. the introns as well as the exons.

If you need to create a catalog from a source other than this package (i.e. other than with ReadCatalog or VCFsToCatalogs, VCFsToZipFile, etc.), then use as.catalog.

Subscripting catalogs

If user wants to subscript specific columns from a catalog, it is needed to call library(ICAMS) beforehand to preserve the ICAMS catalog attribute. Otherwise writing or plotting catalog function in ICAMS may not work properly.

Creating catalogs from variant call files (VCF files)

* VCFsToCatalogs creates 3 SBS catalogs (96, 192, 1536), 3 DBS catalogs (78, 136, 144) and ID (small insertions and deletions) catalog from the VCFs.

Plotting catalogs

* PlotCatalog function plots mutational spectra for one sample or plot one mutational signature.

* PlotCatalogToPdf function plots catalogs of mutational spectra or of mutational signatures to a PDF file.

Wrapper function to create catalogs from VCFs and plot the catalogs to PDF files

* VCFsToCatalogsAndPlotToPdf creates all types of SBS, DBS and ID catalogs from VCFs and plots the catalogs.

Wrapper function to create a zip file which contains catalogs and plot PDFs from VCF files

* VCFsToZipFile creates a zip file which contains SBS, DBS and ID catalogs and plot PDFs from VCF files.

The ref.genome (reference genome) argument

Many functions take the argument ref.genome.

To create a mutational spectrum catalog from a VCF file, "ICAMS" needs the reference genome sequence that matches the VCF file. The ref.genome argument provides this.

ref.genome must be one of

  1. A variable from the Bioconductor BSgenome package that contains a particular reference genome, for example BSgenome.Hsapiens.1000genomes.hs37d5.

  2. The strings "hg38" or "GRCh38", which specify BSgenome.Hsapiens.UCSC.hg38.

  3. The strings "hg19" or "GRCh37", which specify BSgenome.Hsapiens.1000genomes.hs37d5.

  4. The strings "mm10" or "GRCm38", which specify BSgenome.Mmusculus.UCSC.mm10.

All needed reference genomes must be installed separately by the user. Further instructions are at
https://bioconductor.org/packages/release/bioc/html/BSgenome.html.

Use of "ICAMS" with reference genomes other than the 2 human genomes and 1 mouse genome specified above is restricted to catalog.type of counts or counts.signature unless the user also creates the necessary abundance vectors. See all.abundance.

Use available.genomes() to get the list of available genomes.

Writing catalogs to files

* WriteCatalog function writes a catalog to a file.

Reading catalogs

* ReadCatalog function reads a file that contains a catalog in standardized format.

Transforming catalogs

TransformCatalog function transforms catalogs of mutational spectra or signatures to account for differing abundances of the source sequence of the mutations in the genome.

For example, mutations from ACG are much rarer in the human genome than mutations from ACC simply because CG dinucleotides are rare in the genome. Consequently, there are two possible representations of mutational spectra or signatures. One representation is based on mutation counts as observed in a given genome or exome, and this approach is widely used, as, for example, at https://cancer.sanger.ac.uk/signatures/, which presents signatures based on observed mutation counts in the human genome. We call these "counts-based spectra" or "counts-based signatures".

Alternatively, mutational spectra or signatures can be represented as mutations per source sequence, for example the number of ACT > AGT mutations occurring at all ACT 3-mers in a genome. We call these "density-based spectra" or "density-based signatures".

This function can also transform spectra based on observed genome-wide counts to "density"-based catalogs. In density-based catalogs mutations are expressed as mutations per source sequences. For example, a density-based catalog represents the proportion of ACCs mutated to ATCs, the proportion of ACGs mutated to ATGs, etc. This is different from counts-based mutational spectra catalogs, which contain the number of ACC > ATC mutations, the number of ACG > ATG mutations, etc.

This function can also transform observed-count based spectra or signatures from genome to exome based counts, or between different species (since the abundances of source sequences vary between genome and exome and between species).

Collapsing catalogs

CollapseCatalog function

  1. Takes a mutational spectrum or signature catalog that is based on a fined-grained set of features (for example, single-nucleotide substitutions in the context of the preceding and following 2 bases).

  2. Collapses it to a catalog based on a coarser-grained set of features (for example, single-nucleotide substitutions in the context of the immediately preceding and following bases).

Data

  1. CatalogRowOrder Standard order of rownames in a catalog. The rownames encode the type of each mutation. For example, for SBS96 catalogs, the rowname AGAT represents a mutation from AGA > ATA.

  2. TranscriptRanges Transcript ranges and strand information for a particular reference genome.

  3. all.abundance The counts of different source sequences for mutations.

  4. GeneExpressionData Example gene expression data from two cell lines.

"_PACKAGE"


ICAMS documentation built on June 15, 2025, 1:08 a.m.