SplitOneVCF: Split a VCF into SBS, DBS, and ID VCFs, plus a list of other...

SplitOneVCFR Documentation

Split a VCF into SBS, DBS, and ID VCFs, plus a list of other mutations

Description

Split a VCF into SBS, DBS, and ID VCFs, plus a list of other mutations

Usage

SplitOneVCF(
  vcf.df,
  max.vaf.diff = 0.02,
  name.of.VCF = NULL,
  always.merge.SBS = FALSE,
  chr.names.to.process = NULL
)

Arguments

vcf.df

An in-memory data.frame representing a VCF, including VAFs, which are added by ReadVCF.

max.vaf.diff

The maximum difference of VAF, default value is 0.02. If the absolute difference of VAFs for adjacent SBSs is bigger than max.vaf.diff, then these adjacent SBSs are likely to be "merely" asynchronous single base mutations, opposed to a simultaneous doublet mutation or variants involving more than two consecutive bases. Use negative value (e.g. -1) to suppress merging adjacent SBSs to DBS.

name.of.VCF

Name of the VCF file.

always.merge.SBS

If TRUE merge adjacent SBSs as DBSs regardless of VAFs and regardless of the value of max.vaf.diff.

chr.names.to.process

A character vector specifying the chromosome names in VCF whose variants will be kept and processed, other chromosome variants will be discarded. If NULL(default), all variants will be kept except those on chromosomes with names that contain strings "GL", "KI", "random", "Hs", "M", "JH", "fix", "alt".

Value

A list with 3 in-memory VCFs and discarded variants that were not incorporated into the first 3 VCFs:

* SBS: VCF with only single base substitutions.

* DBS: VCF with only doublet base substitutions.

* ID: VCF with only small insertions and deletions.

* discarded.variants: Non-NULL only if there are variants that were excluded from the analysis. See the added extra column discarded.reason for more details. @md


ICAMS documentation built on June 15, 2025, 1:08 a.m.