R/ICAMS.R

#' ICAMS: In-depth Characterization and Analysis of Mutational Signatures
#'
#' Analysis and visualization of experimentally elucidated mutational signatures
#' -- the kind of analysis and visualization in Boot et al., "In-depth
#' characterization of the cisplatin mutational signature in human cell lines
#' and in esophageal and liver tumors", \cr
#' \emph{Genome Research} 2018 https://doi.org/10.1101/gr.230219.117 and
#' "Characterization of colibactin-associated mutational signature in an
#' Asian oral squamous cell carcinoma and in other mucosal tumor types",
#' \emph{Genome Research} 2020, https://doi.org/10.1101/gr.255620.119.
#' "ICAMS" stands for In-depth Characterization and
#' Analysis of Mutational Signatures. "ICAMS" has functions to read in variant
#' call files (VCFs) and to collate the corresponding catalogs of mutational
#' spectra and to analyze and plot catalogs of mutational spectra and
#' signatures. Handles both "counts-based" and "density-based" catalogs of
#' mutational spectra or signatures.
#'
#' "ICAMS" can read in VCFs generated by Strelka or Mutect, and collate the
#' mutations  into "catalogs" of mutational spectra. "ICAMS" can create and plot
#' catalogs of mutational spectra or signatures for single base substitutions
#' (SBS), double base substitutions (DBS), and small insertions and deletions
#' (ID). It can also read and write these catalogs.
#'
#' @section Catalogs:
#' A key data type in "ICAMS" is a "catalog" of mutation counts, of
#' mutation densities, or of mutational signatures.
#'
#' Catalogs are S3 objects of class \code{matrix} and one of
#' several additional classes that specify the types of the mutations
#' represented in the catalog. The possible
#' additional class is one of
#' \itemize{
#' \item \code{SBS96Catalog} (strand-agnostic single base substitutions in
#' trinucleotide context)
#' \item \code{SBS192Catalog} (transcription-stranded single-base substitutions
#'  in trinucleotide context)
#' \item \code{SBS1536Catalog}
#' \item \code{DBS78Catalog}
#' \item \code{DBS144Catalog}
#' \item \code{DBS136Catalog}
#' \item \code{IndelCatalog}
#' }
#' \code{\link{as.catalog}}
#' is the main constructor.
#'
#' Conceptually, a catalog also has one of the following types,
#' indicated by the attribute \code{catalog.type}:
#'
#' \enumerate{
#'
#' \item Matrix of mutation counts (one column per sample), representing
#' (counts-based) mutational spectra (\code{catalog.type = "counts"}).
#'
#' \item Matrix of mutation densities, i.e. mutations per occurrences
#'  of source sequences (one column per sample), representing
#'  (density-based) mutational spectra (\code{catalog.type = "density"}).
#'
#' \item Matrix of mutational signatures, which
#' are similar to spectra. However where spectra consist of
#' counts or densities of mutations in each mutation class
#' (e.g. ACA > AAA, ACA > AGA, ACA > ATA, ACC > AAC, ...),
#' signatures consist of
#' the proportions of mutations in each class (with all the
#' proportions summing to 1). A mutational signature can be based
#' on either:
#' \itemize{
#'   \item mutation counts (a "counts-based mutational signature",
#'   \code{catalog.type = "counts.signature"}), or
#'   \item mutation densities (a "density-based mutational signature",
#'   \code{catalog.type = "density.signature"}).
#' }
#' }
#'
#' Catalogs also have the attribute \code{abundance}, which contains the
#' counts of different source sequences for mutations. For example,
#' for SBSs in trinucleotide context, the abundances would be the counts
#' of each trinucleotide in the human genome, exome, or in the transcribed
#' region of the genome. See \code{\link{TransformCatalog}}
#' for more information. Abundances logically depend on the species in
#' question and on the part of the genome being analyzed.
#'
#' In "ICAMS"
#' abundances can sometimes be inferred from the
#' \code{catalog} class attribute and the
#' function arguments \code{region}, \code{ref.genome},
#' and \code{catalog.type}.
#' Otherwise abundances can be provided as an \code{abundance} argument.
#' See \code{\link{all.abundance}} for examples.
#'
#'
#' Possible values for
#' \code{region} are the strings \code{genome}, \code{transcript},
#' \code{exome}, and \code{unknown}; \code{transcript} includes entire
#' transcribed regions, i.e. the introns as well as the exons.
#'
#' If you need to create a catalog from a source other than
#' this package (i.e. other than with
#' \code{\link{ReadCatalog}}
#' or \code{\link{StrelkaSBSVCFFilesToCatalog}},
#' \code{\link{MutectVCFFilesToCatalog}}, etc.), then use
#' \code{\link{as.catalog}}.
#'
#' @section Creating catalogs from variant call files (VCF files):
#' \enumerate{
#' \item \code{\link{VCFsToCatalogs}} creates 3 SBS catalogs (96, 192, 1536), 3
#' DBS catalogs (78, 136, 144) and ID (small insertion and deletion) catalog
#' from the VCFs. It has more general usage with functionalities overlapping
#' with the three functions below. For example, it is the same as
#' \code{\link{MutectVCFFilesToCatalog}} when \code{variant.caller = "mutect"}.
#'
#' \item \code{\link{StrelkaSBSVCFFilesToCatalog}} creates 3 SBS catalogs (96,
#' 192, 1536) and 3 DBS catalogs (78, 136, 144) from the Strelka SBS VCFs.
#'
#' \item \code{\link{StrelkaIDVCFFilesToCatalog}} creates an ID
#' (small insertion and deletion) catalog
#' from the Strelka ID VCFs.
#'
#' \item \code{\link{MutectVCFFilesToCatalog}} creates 3 SBS catalogs (96, 192,
#' 1536), 3 DBS catalogs (78, 136, 144) and ID (small insertion and deletion)
#' catalog from the Mutect VCFs.
#' }
#'
#' @section Plotting catalogs:
#' The \code{\link{PlotCatalog}} functions plot mutational spectra
#' for \strong{one} sample or plot \strong{one} mutational signature.
#'
#' The \code{\link{PlotCatalogToPdf}}
#' functions plot catalogs of mutational spectra or
#' of mutational signatures to a PDF file.
#'
#' @section Wrapper functions to create catalogs from VCFs and plot the catalogs to PDF files:
#' \enumerate{
#' \item \code{\link{VCFsToCatalogsAndPlotToPdf}} creates all types of SBS, DBS
#' and ID catalogs from VCFs and plots the catalogs. It has more general usage
#' with functionalities overlapping with the three functions below. For
#' example, it is the same as \code{\link{MutectVCFFilesToCatalogAndPlotToPdf}}
#' when \code{variant.caller = "mutect"}.
#'
#' \item \code{\link{StrelkaSBSVCFFilesToCatalogAndPlotToPdf}} creates all
#' type of SBS and DBS catalogs from Strelka SBS VCFs and plots the catalogs.
#'
#' \item \code{\link{StrelkaIDVCFFilesToCatalogAndPlotToPdf}} creates an ID
#' (small insertion and deletion) catalog from Strelka ID VCFs and plot it.
#'
#' \item \code{\link{MutectVCFFilesToCatalogAndPlotToPdf}} creates all types of
#' SBS, DBS and ID catalogs from Mutect VCFs and plots the catalogs. }
#'
#' @section Wrapper functions to create a zip file which contains catalogs and plot PDFs from VCF files:
#' \enumerate{
#' \item \code{\link{VCFsToZipFile}} creates a zip file which contains SBS, DBS
#' and ID catalogs and plot PDFs from VCF files. It has more general usage with
#' functionalities overlapping with the three functions below. For example,
#' it is the same as \code{\link{MutectVCFFilesToZipFile}} when
#' \code{variant.caller = "mutect"}.
#'
#' \item \code{\link{StrelkaSBSVCFFilesToZipFile}} creates a zip file which
#' contains SBS and DBS catalogs and plot PDFs from Strelka SBS VCF files.
#'
#' \item \code{\link{StrelkaIDVCFFilesToZipFile}} creates a zip file which
#' contains ID (small insertion and deletion) catalog and plot PDF from
#' Strelka ID VCF files.
#'
#' \item \code{\link{MutectVCFFilesToZipFile}} creates a zip file which contains
#' SBS, DBS and ID catalogs and plot PDFs from Mutect VCF files. }
#'
#' @section The \code{ref.genome} (reference genome) argument:
#'
#' Many functions take the argument \code{ref.genome}.
#'
#' To create a mutational
#' spectrum catalog from a VCF file, ICAMS needs the reference genome sequence
#' that matches the VCF file. The \code{ref.genome} argument
#' provides this.
#'
#' \code{ref.genome} must be one of
#' \enumerate{
#'   \item A variable from the Bioconductor \code{\link{BSgenome}} package
#'   that contains a particular reference genome, for example
#'   \code{BSgenome.Hsapiens.1000genomes.hs37d5}.
#'
#'  \item The strings \code{"hg38"} or \code{"GRCh38"}, which specify
#'  \code{BSgenome.Hsapiens.UCSC.hg38}.
#'  \item The strings \code{"hg19"} or \code{"GRCh37"},
#'  which specify
#'  \code{BSgenome.Hsapiens.1000genomes.hs37d5}.
#'  \item The strings \code{"mm10"} or \code{"GRCm38"},
#'  which specify
#'  \code{BSgenome.Mmusculus.UCSC.mm10}.
#'  }
#'
#' All needed reference genomes must be installed separately by the user.
#' Further instructions are at \cr
#' https://bioconductor.org/packages/release/bioc/html/BSgenome.html. \cr
#'
#' Use of ICAMS with reference genomes other than the 2 human genomes
#' and 1 mouse genome specified above is restricted to
#' \code{catalog.type} of \code{counts} or \code{counts.signature}
#' unless the user also creates the necessary abundance vectors.
#' See \code{\link{all.abundance}}.
#'
#' Use \code{\link[BSgenome]{available.genomes}()}
#'  to get the list of available genomes.
#'
#' @section Writing catalogs to files:
#' The \code{\link{WriteCatalog}} functions
#' write a catalog to a file.
#'
#' @section Reading catalogs:
#' The \code{\link{ReadCatalog}} functions
#' read a file that contains a catalog in standardized format.
#'
#' @section Transforming catalogs:
#' The \code{\link{TransformCatalog}}
#' function transforms catalogs of mutational spectra or
#' signatures to account for differing abundances of the source
#' sequence of the mutations in the genome.
#'
#' For example, mutations from
#' ACG are much rarer in the human genome than mutations from ACC
#' simply because CG dinucleotides are rare in the genome.
#' Consequently, there are two possible representations of
#' mutational spectra or signatures. One representation is
#' based on mutation counts as observed in a given genome
#' or exome,
#' and this approach is widely used, as, for example, at
#' https://cancer.sanger.ac.uk/signatures/, which
#' presents signatures based on observed mutation counts
#' in the human genome. We call these "counts-based spectra"
#' or "counts-based signatures".
#'
#' Alternatively,
#' mutational spectra or signatures can be represented as
#' mutations per source sequence, for example
#' the number of ACT > AGT mutations occurring at all
#' ACT 3-mers in a genome. We call these "density-based
#' spectra" or "density-based signatures".
#'
#' This function can also transform spectra
#' based on observed genome-wide counts to "density"-based
#' catalogs. In density-based catalogs
#' mutations are expressed as mutations per
#' source sequences. For example,
#' a density-based catalog represents
#' the proportion of ACCs mutated to
#' ATCs, the proportion of ACGs mutated to ATGs, etc.
#' This is
#' different from counts-based mutational spectra catalogs, which
#' contain the number of ACC > ATC mutations, the number of
#' ACG > ATG mutations, etc.
#'
#' This function can also transform observed-count based
#' spectra or signatures from genome to exome based counts,
#' or between different species (since the abundances of
#' source sequences vary between genome and exome and between
#' species).
#'
#' @section Collapsing catalogs:
#' The \code{\link{CollapseCatalog}} functions
#' \enumerate{
#' \item Take a mutational spectrum or signature catalog
#' that is based on a fined-grained set of features (for example, single-nucleotide
#' substitutions in the context of the preceding and following 2 bases).
#'
#' \item Collapse it to a catalog based on a coarser-grained set of features
#' (for example, single-nucleotide substitutions in the context of the
#' immediately preceding and following bases).
#' }
#'
#' @section Data:
#'  \enumerate{
#'
#' \item \code{\link{CatalogRowOrder}} Standard order of rownames in a catalog.
#' The rownames encode the type of each mutation. For example, for SBS96
#' catalogs, the rowname AGAT represents a mutation from AGA > ATA.
#'
#'\item \code{\link{TranscriptRanges}} Transcript ranges and strand information
#' for a particular reference genome.
#'
#'\item \code{\link{GeneExpressionData}} Example gene expression data from two
#'cell lines.
#'
#'  }
#' @docType package
#' @name ICAMS
NULL

Try the ICAMS package in your browser

Any scripts or data that you put into this service are public.

ICAMS documentation built on April 3, 2021, 5:07 p.m.