ReadAndSplitVCFs: Read and split VCF files
In ICAMS: In-Depth Characterization and Analysis of Mutational Signatures ('ICAMS')

View source: R/shiny_related_functions.R

ReadAndSplitVCFs

R Documentation

Read and split VCF files

Description

Read and split VCF files

Usage

ReadAndSplitVCFs(
  files,
  variant.caller = "unknown",
  num.of.cores = 1,
  names.of.VCFs = NULL,
  tumor.col.names = NA,
  filter.status = DefaultFilterStatus(variant.caller),
  get.vaf.function = NULL,
  ...,
  max.vaf.diff = 0.02,
  suppress.discarded.variants.warnings = TRUE,
  always.merge.SBS = FALSE,
  chr.names.to.process = NULL
)

Arguments

`files`	Character vector of file paths to the VCF files.
`variant.caller`	Name of the variant caller that produces the VCF, can be either `"strelka"`, `"mutect"`, `"freebayes"` or `"unknown"`. This information is needed to calculate the VAFs (variant allele frequencies). If variant caller is `"unknown"`(default) and `get.vaf.function` is NULL, then VAF and read depth will be NAs. If variant caller is `"mutect"`, do not merge SBSs into DBS.
`num.of.cores`	The number of cores to use. Not available on Windows unless `num.of.cores = 1`.
`names.of.VCFs`	Optional. Character vector of names of the VCF files. The order of names in `names.of.VCFs` should match the order of VCF file paths in `files`. If `NULL`(default), this function will remove all of the path up to and including the last path separator (if any) in `files` and file paths without extensions (and the leading dot) will be used as the names of the VCF files.
`tumor.col.names`	Optional. Only applicable to Mutect VCFs. Vector of column names or column indices in Mutect VCFs which contain the tumor sample information. The order of elements in `tumor.col.names` should match the order of Mutect VCFs specified in `files`. If `tumor.col.names` is equal to `NA`(default), this function will use the 10th column in all the Mutect VCFs to calculate VAFs. See `GetMutectVAF` for more details.
`filter.status`	The character string in column `FILTER` of the VCF that indicates that a variant has passed all the variant caller's filters. Variants (lines in the VCF) for which the value in column `FILTER` does not equal `filter.status` are silently excluded from the output. The internal function `DefaultFilterStatus` tries to infer `filter.status` based on `variant.caller`. If `variant.caller` is "unknown", user must specify `filter.status` explicitly. If `filter.status = NULL`, all variants are retained. If there is no `FILTER` column in the VCF, all variants are retained with a warning.
`get.vaf.function`	Optional. Only applicable when `variant.caller` is "unknown". Function to calculate VAF(variant allele frequency) and read depth information from original VCF. See `GetMutectVAF` as an example. If `NULL`(default) and `variant.caller` is "unknown", then VAF and read depth will be NAs.
`...`	Optional arguments to `get.vaf.function`.
`max.vaf.diff`	Not applicable if `variant.caller = "mutect"`. The maximum difference of VAF, default value is 0.02. If the absolute difference of VAFs for adjacent SBSs is bigger than `max.vaf.diff`, then these adjacent SBSs are likely to be "merely" asynchronous single base mutations, opposed to a simultaneous doublet mutation or variants involving more than two consecutive bases. Use negative value (e.g. -1) to suppress merging adjacent SBSs to DBS.
`suppress.discarded.variants.warnings`	Logical. Whether to suppress warning messages showing information about the discarded variants. Default is TRUE.
`always.merge.SBS`	If `TRUE` merge adjacent SBSs as DBSs regardless of VAFs and regardless of the value of `max.vaf.diff` and regardless of the value of `get.vaf.function`. It is an error to set this to `TRUE` when `variant.caller = "mutect"`.
`chr.names.to.process`	A character vector specifying the chromosome names in VCF whose variants will be kept and processed, other chromosome variants will be discarded. If NULL(default), all variants will be kept except those on chromosomes with names that contain strings "GL", "KI", "random", "Hs", "M", "JH", "fix", "alt".

Value

A list containing the following objects:

SBS: List of VCFs with only single base substitutions.
DBS: List of VCFs with only doublet base substitutions.
ID: List of VCFs with only small insertions and deletions.
discarded.variants: Non-NULL only if there are variants that were excluded from the analysis. See the added extra column discarded.reason for more details.

Examples

file <- c(system.file("extdata/Mutect-vcf",
                      "Mutect.GRCh37.s1.vcf",
                      package = "ICAMS"))
list.of.vcfs <- ReadAndSplitVCFs(file, variant.caller = "mutect")

ICAMS documentation built on June 15, 2025, 1:08 a.m.