ICAMS | R Documentation |
Analysis and visualization of experimentally elucidated mutational signatures
– the kind of analysis and visualization in Boot et al., "In-depth
characterization of the cisplatin mutational signature in human cell lines
and in esophageal and liver tumors",
Genome Research 2018 https://doi.org/10.1101/gr.230219.117 and
"Characterization of colibactin-associated mutational signature in an
Asian oral squamous cell carcinoma and in other mucosal tumor types",
Genome Research 2020, https://doi.org/10.1101/gr.255620.119.
"ICAMS" stands for In-depth Characterization and
Analysis of Mutational Signatures. "ICAMS" has functions to read in variant
call files (VCFs) and to collate the corresponding catalogs of mutational
spectra and to analyze and plot catalogs of mutational spectra and
signatures. Handles both "counts-based" and "density-based" catalogs of
mutational spectra or signatures.
"ICAMS" can read in VCFs generated by Strelka or Mutect, and collate the mutations into "catalogs" of mutational spectra. "ICAMS" can create and plot catalogs of mutational spectra or signatures for single base substitutions (SBS), double base substitutions (DBS), and small insertions and deletions (ID). It can also read and write these catalogs.
A key data type in "ICAMS" is a "catalog" of mutation counts, of mutation densities, or of mutational signatures.
Catalogs are S3 objects of class matrix
and one of
several additional classes that specify the types of the mutations
represented in the catalog. The possible
additional class is one of
SBS96Catalog
(strand-agnostic single base substitutions in
trinucleotide context)
SBS192Catalog
(transcription-stranded single-base substitutions
in trinucleotide context)
SBS1536Catalog
DBS78Catalog
DBS144Catalog
DBS136Catalog
IndelCatalog
as.catalog
is the main constructor.
Conceptually, a catalog also has one of the following types,
indicated by the attribute catalog.type
:
Matrix of mutation counts (one column per sample), representing
(counts-based) mutational spectra (catalog.type = "counts"
).
Matrix of mutation densities, i.e. mutations per occurrences
of source sequences (one column per sample), representing
(density-based) mutational spectra (catalog.type = "density"
).
Matrix of mutational signatures, which are similar to spectra. However where spectra consist of counts or densities of mutations in each mutation class (e.g. ACA > AAA, ACA > AGA, ACA > ATA, ACC > AAC, ...), signatures consist of the proportions of mutations in each class (with all the proportions summing to 1). A mutational signature can be based on either:
mutation counts (a "counts-based mutational signature",
catalog.type = "counts.signature"
), or
mutation densities (a "density-based mutational signature",
catalog.type = "density.signature"
).
Catalogs also have the attribute abundance
, which contains the
counts of different source sequences for mutations. For example,
for SBSs in trinucleotide context, the abundances would be the counts
of each trinucleotide in the human genome, exome, or in the transcribed
region of the genome. See TransformCatalog
for more information. Abundances logically depend on the species in
question and on the part of the genome being analyzed.
In "ICAMS"
abundances can sometimes be inferred from the
catalog
class attribute and the
function arguments region
, ref.genome
,
and catalog.type
.
Otherwise abundances can be provided as an abundance
argument.
See all.abundance
for examples.
Possible values for
region
are the strings genome
, transcript
,
exome
, and unknown
; transcript
includes entire
transcribed regions, i.e. the introns as well as the exons.
If you need to create a catalog from a source other than
this package (i.e. other than with
ReadCatalog
or StrelkaSBSVCFFilesToCatalog
,
MutectVCFFilesToCatalog
, etc.), then use
as.catalog
.
VCFsToCatalogs
creates 3 SBS catalogs (96, 192, 1536), 3
DBS catalogs (78, 136, 144) and ID (small insertion and deletion) catalog
from the VCFs. It has more general usage with functionalities overlapping
with the three functions below. For example, it is the same as
MutectVCFFilesToCatalog
when variant.caller = "mutect"
.
StrelkaSBSVCFFilesToCatalog
creates 3 SBS catalogs (96,
192, 1536) and 3 DBS catalogs (78, 136, 144) from the Strelka SBS VCFs.
StrelkaIDVCFFilesToCatalog
creates an ID
(small insertion and deletion) catalog
from the Strelka ID VCFs.
MutectVCFFilesToCatalog
creates 3 SBS catalogs (96, 192,
1536), 3 DBS catalogs (78, 136, 144) and ID (small insertion and deletion)
catalog from the Mutect VCFs.
The PlotCatalog
functions plot mutational spectra
for one sample or plot one mutational signature.
The PlotCatalogToPdf
functions plot catalogs of mutational spectra or
of mutational signatures to a PDF file.
VCFsToCatalogsAndPlotToPdf
creates all types of SBS, DBS
and ID catalogs from VCFs and plots the catalogs. It has more general usage
with functionalities overlapping with the three functions below. For
example, it is the same as MutectVCFFilesToCatalogAndPlotToPdf
when variant.caller = "mutect"
.
StrelkaSBSVCFFilesToCatalogAndPlotToPdf
creates all
type of SBS and DBS catalogs from Strelka SBS VCFs and plots the catalogs.
StrelkaIDVCFFilesToCatalogAndPlotToPdf
creates an ID
(small insertion and deletion) catalog from Strelka ID VCFs and plot it.
MutectVCFFilesToCatalogAndPlotToPdf
creates all types of
SBS, DBS and ID catalogs from Mutect VCFs and plots the catalogs.
VCFsToZipFile
creates a zip file which contains SBS, DBS
and ID catalogs and plot PDFs from VCF files. It has more general usage with
functionalities overlapping with the three functions below. For example,
it is the same as MutectVCFFilesToZipFile
when
variant.caller = "mutect"
.
StrelkaSBSVCFFilesToZipFile
creates a zip file which
contains SBS and DBS catalogs and plot PDFs from Strelka SBS VCF files.
StrelkaIDVCFFilesToZipFile
creates a zip file which
contains ID (small insertion and deletion) catalog and plot PDF from
Strelka ID VCF files.
MutectVCFFilesToZipFile
creates a zip file which contains
SBS, DBS and ID catalogs and plot PDFs from Mutect VCF files.
ref.genome
(reference genome) argumentMany functions take the argument ref.genome
.
To create a mutational
spectrum catalog from a VCF file, ICAMS needs the reference genome sequence
that matches the VCF file. The ref.genome
argument
provides this.
ref.genome
must be one of
A variable from the Bioconductor BSgenome
package
that contains a particular reference genome, for example
BSgenome.Hsapiens.1000genomes.hs37d5
.
The strings "hg38"
or "GRCh38"
, which specify
BSgenome.Hsapiens.UCSC.hg38
.
The strings "hg19"
or "GRCh37"
,
which specify
BSgenome.Hsapiens.1000genomes.hs37d5
.
The strings "mm10"
or "GRCm38"
,
which specify
BSgenome.Mmusculus.UCSC.mm10
.
All needed reference genomes must be installed separately by the user.
Further instructions are at
https://bioconductor.org/packages/release/bioc/html/BSgenome.html.
Use of ICAMS with reference genomes other than the 2 human genomes
and 1 mouse genome specified above is restricted to
catalog.type
of counts
or counts.signature
unless the user also creates the necessary abundance vectors.
See all.abundance
.
Use available.genomes()
to get the list of available genomes.
The WriteCatalog
functions
write a catalog to a file.
The ReadCatalog
functions
read a file that contains a catalog in standardized format.
The TransformCatalog
function transforms catalogs of mutational spectra or
signatures to account for differing abundances of the source
sequence of the mutations in the genome.
For example, mutations from ACG are much rarer in the human genome than mutations from ACC simply because CG dinucleotides are rare in the genome. Consequently, there are two possible representations of mutational spectra or signatures. One representation is based on mutation counts as observed in a given genome or exome, and this approach is widely used, as, for example, at https://cancer.sanger.ac.uk/signatures/, which presents signatures based on observed mutation counts in the human genome. We call these "counts-based spectra" or "counts-based signatures".
Alternatively, mutational spectra or signatures can be represented as mutations per source sequence, for example the number of ACT > AGT mutations occurring at all ACT 3-mers in a genome. We call these "density-based spectra" or "density-based signatures".
This function can also transform spectra based on observed genome-wide counts to "density"-based catalogs. In density-based catalogs mutations are expressed as mutations per source sequences. For example, a density-based catalog represents the proportion of ACCs mutated to ATCs, the proportion of ACGs mutated to ATGs, etc. This is different from counts-based mutational spectra catalogs, which contain the number of ACC > ATC mutations, the number of ACG > ATG mutations, etc.
This function can also transform observed-count based spectra or signatures from genome to exome based counts, or between different species (since the abundances of source sequences vary between genome and exome and between species).
The CollapseCatalog
functions
Take a mutational spectrum or signature catalog that is based on a fined-grained set of features (for example, single-nucleotide substitutions in the context of the preceding and following 2 bases).
Collapse it to a catalog based on a coarser-grained set of features (for example, single-nucleotide substitutions in the context of the immediately preceding and following bases).
CatalogRowOrder
Standard order of rownames in a catalog.
The rownames encode the type of each mutation. For example, for SBS96
catalogs, the rowname AGAT represents a mutation from AGA > ATA.
TranscriptRanges
Transcript ranges and strand information
for a particular reference genome.
GeneExpressionData
Example gene expression data from two
cell lines.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.