SplitListOfVCFs: Split each VCF into SBS, DBS, and ID VCFs (plus VCF-like data...

SplitListOfVCFsR Documentation

Split each VCF into SBS, DBS, and ID VCFs (plus VCF-like data frame with left-over rows)

Description

Split each VCF into SBS, DBS, and ID VCFs (plus VCF-like data frame with left-over rows)

Usage

SplitListOfVCFs(
  list.of.vcfs,
  variant.caller,
  max.vaf.diff = 0.02,
  num.of.cores = 1,
  suppress.discarded.variants.warnings = TRUE,
  always.merge.SBS = FALSE,
  chr.names.to.process = NULL
)

Arguments

list.of.vcfs

List of VCFs as in-memory data frames. The VCFs should have VAF and read.depth information added. See ReadVCFs for more details.

variant.caller

Name of the variant caller that produces the VCF, can be either "strelka", "mutect", "freebayes" or "unknown". If variant caller is "mutect", do not merge SBSs into DBS.

max.vaf.diff

The maximum difference of VAF, default value is 0.02. If the absolute difference of VAFs for adjacent SBSs is bigger than max.vaf.diff, then these adjacent SBSs are likely to be "merely" asynchronous single base mutations, opposed to a simultaneous doublet mutation or variants involving more than two consecutive bases. Use negative value (e.g. -1) to suppress merging adjacent SBSs to DBS.

num.of.cores

The number of cores to use. Not available on Windows unless num.of.cores = 1.

suppress.discarded.variants.warnings

Logical. Whether to suppress warning messages showing information about the discarded variants. Default is TRUE.

always.merge.SBS

If TRUE merge adjacent SBSs as DBSs regardless of VAFs and regardless of the value of max.vaf.diff. It is an error to set this to TRUE when variant.caller = "mutect".

chr.names.to.process

A character vector specifying the chromosome names in VCF whose variants will be kept and processed, other chromosome variants will be discarded. If NULL(default), all variants will be kept except those on chromosomes with names that contain strings "GL", "KI", "random", "Hs", "M", "JH", "fix", "alt".

Value

A list containing the following objects:

  • SBS: List of VCFs with only single base substitutions.

  • DBS: List of VCFs with only doublet base substitutions as called by Mutect.

  • ID: List of VCFs with only small insertions and deletions.

  • discarded.variants: Non-NULL only if there are variants that were excluded from the analysis. See the added extra column discarded.reason for more details.

Examples

file <- c(system.file("extdata/Mutect-vcf",
                      "Mutect.GRCh37.s1.vcf",
                      package = "ICAMS"))
list.of.vcfs <- ReadVCFs(file, variant.caller = "mutect")
split.vcfs <- SplitListOfVCFs(list.of.vcfs, variant.caller = "mutect")

ICAMS documentation built on June 15, 2025, 1:08 a.m.