SplitListOfVCFs: Split each VCF into SBS, DBS, and ID VCFs (plus VCF-like data...
In ICAMS: In-Depth Characterization and Analysis of Mutational Signatures ('ICAMS')

SplitListOfVCFs

R Documentation

Split each VCF into SBS, DBS, and ID VCFs (plus VCF-like data frame with left-over rows)

Description

Split each VCF into SBS, DBS, and ID VCFs (plus VCF-like data frame with left-over rows)

Usage

SplitListOfVCFs(
  list.of.vcfs,
  variant.caller,
  max.vaf.diff = 0.02,
  num.of.cores = 1,
  suppress.discarded.variants.warnings = TRUE,
  always.merge.SBS = FALSE,
  chr.names.to.process = NULL
)

Arguments

`list.of.vcfs`	List of VCFs as in-memory data frames. The VCFs should have `VAF` and `read.depth` information added. See `ReadVCFs` for more details.
`variant.caller`	Name of the variant caller that produces the VCF, can be either `"strelka"`, `"mutect"`, `"freebayes"` or `"unknown"`. If variant caller is `"mutect"`, do not merge SBSs into DBS.
`max.vaf.diff`	The maximum difference of VAF, default value is 0.02. If the absolute difference of VAFs for adjacent SBSs is bigger than `max.vaf.diff`, then these adjacent SBSs are likely to be "merely" asynchronous single base mutations, opposed to a simultaneous doublet mutation or variants involving more than two consecutive bases. Use negative value (e.g. -1) to suppress merging adjacent SBSs to DBS.
`num.of.cores`	The number of cores to use. Not available on Windows unless `num.of.cores = 1`.
`suppress.discarded.variants.warnings`	Logical. Whether to suppress warning messages showing information about the discarded variants. Default is TRUE.
`always.merge.SBS`	If `TRUE` merge adjacent SBSs as DBSs regardless of VAFs and regardless of the value of `max.vaf.diff`. It is an error to set this to `TRUE` when `variant.caller = "mutect"`.
`chr.names.to.process`	A character vector specifying the chromosome names in VCF whose variants will be kept and processed, other chromosome variants will be discarded. If `NULL`(default), all variants will be kept except those on chromosomes with names that contain strings "GL", "KI", "random", "Hs", "M", "JH", "fix", "alt".

Value

A list containing the following objects:

SBS: List of VCFs with only single base substitutions.
DBS: List of VCFs with only doublet base substitutions as called by Mutect.
ID: List of VCFs with only small insertions and deletions.
discarded.variants: Non-NULL only if there are variants that were excluded from the analysis. See the added extra column discarded.reason for more details.

Examples

file <- c(system.file("extdata/Mutect-vcf",
                      "Mutect.GRCh37.s1.vcf",
                      package = "ICAMS"))
list.of.vcfs <- ReadVCFs(file, variant.caller = "mutect")
split.vcfs <- SplitListOfVCFs(list.of.vcfs, variant.caller = "mutect")

ICAMS documentation built on June 15, 2025, 1:08 a.m.