StrelkaIDVCFFilesToZipFile: *[Deprecated, use VCFsToZipFile(variant.caller = "strelka")...
In ICAMS: In-Depth Characterization and Analysis of Mutational Signatures ('ICAMS')

View source: R/shiny_related_functions.R

StrelkaIDVCFFilesToZipFile

R Documentation

[Deprecated, use VCFsToZipFile(variant.caller = "strelka") instead] Create a zip file which contains ID (small insertions and deletions) catalog and plot PDF from Strelka ID VCF files

Description

[Deprecated, use VCFsToZipFile(variant.caller = "strelka") instead] Create ID (small insertions and deletions) catalog from the Strelka ID VCFs specified by dir, save the catalog as CSV file, plot it to PDF and generate a zip archive of all the output files.

Usage

StrelkaIDVCFFilesToZipFile(
  dir,
  zipfile,
  ref.genome,
  region = "unknown",
  names.of.VCFs = NULL,
  base.filename = "",
  flag.mismatches = 0,
  return.annotated.vcfs = FALSE,
  suppress.discarded.variants.warnings = TRUE
)

Arguments

`dir`	Pathname of the directory which contains only the Strelka ID VCF files. Each Strelka ID VCF must have a file extension ".vcf" (case insensitive) and share the same `ref.genome` and `region`.
`zipfile`	Pathname of the zip file to be created.
`ref.genome`	A `ref.genome` argument as described in `ICAMS`.
`region`	A character string designating a genomic region; see `as.catalog` and `ICAMS`.
`names.of.VCFs`	Optional. Character vector of names of the VCF files. The order of names in `names.of.VCFs` should match the order of VCFs listed in `dir`. If `NULL`(default), this function will remove all of the path up to and including the last path separator (if any) in `dir` and file paths without extensions (and the leading dot) will be used as the names of the VCF files.
`base.filename`	Optional. The base name of the CSV and PDF file to be produced; the file is ending in `catID.csv` and `catID.pdf` respectively.
`flag.mismatches`	Deprecated. If there are ID variants whose `REF` do not match the extracted sequence from `ref.genome`, the function will automatically discard these variants and an element `discarded.variants` will appear in the return value. See `AnnotateIDVCF` for more details.
`return.annotated.vcfs`	Logical. Whether to return the annotated VCFs with additional columns showing mutation class for each variant. Default is FALSE.
`suppress.discarded.variants.warnings`	Logical. Whether to suppress warning messages showing information about the discarded variants. Default is TRUE.

Details

This function calls StrelkaIDVCFFilesToCatalog, PlotCatalogToPdf, WriteCatalog and zip::zipr.

Value

A list of elements:

catalog: The ID (small insertions and deletions) catalog with attributes added. See as.catalog for more details.
discarded.variants: Non-NULL only if there are variants that were excluded from the analysis. See the added extra column discarded.reason for more details.
annotated.vcfs: Non-NULL only if return.annotated.vcfs = TRUE. A list of data frames which contain the original VCF's ID mutation rows with three additional columns seq.context.width, seq.context and ID.class added. The category assignment of each ID mutation in VCF can be obtained from ID.class column.

ID classification

See https://github.com/steverozen/ICAMS/blob/v3.0.9-branch/data-raw/PCAWG7_indel_classification_2021_09_03.xlsx for additional information on ID (small insertions and deletions) mutation classification.

See the documentation for Canonicalize1Del which first handles deletions in homopolymers, then handles deletions in simple repeats with longer repeat units, (e.g. CACACACA, see FindMaxRepeatDel), and if the deletion is not in a simple repeat, looks for microhomology (see FindDelMH).

See the code for unexported function CanonicalizeID and the functions it calls for handling of insertions.

Note

In ID (small insertions and deletions) catalogs, deletion repeat sizes range from 0 to 5+, but for plotting and end-user documentation deletion repeat sizes range from 1 to 6+.

Examples

## Not run: 
dir <- c(system.file("extdata/Strelka-ID-vcf",
                     package = "ICAMS"))
if (requireNamespace("BSgenome.Hsapiens.1000genomes.hs37d5", quietly = TRUE)) {
  catalogs <-
    StrelkaIDVCFFilesToZipFile(dir,
                               zipfile = file.path(tempdir(), "test.zip"),
                               ref.genome = "hg19",
                               region = "genome",
                               base.filename = "Strelka-ID")
  unlink(file.path(tempdir(), "test.zip"))}

## End(Not run)

ICAMS documentation built on June 15, 2025, 1:08 a.m.