ReadAndSplitVCFs: Read and split VCF files

View source: R/shiny_related_functions.R

ReadAndSplitVCFsR Documentation

Read and split VCF files

Description

Read and split VCF files

Usage

ReadAndSplitVCFs(
  files,
  variant.caller = "unknown",
  num.of.cores = 1,
  names.of.VCFs = NULL,
  tumor.col.names = NA,
  filter.status = NULL,
  get.vaf.function = NULL,
  ...,
  max.vaf.diff = 0.02,
  suppress.discarded.variants.warnings = TRUE
)

Arguments

files

Character vector of file paths to the VCF files.

variant.caller

Name of the variant caller that produces the VCF, can be either "strelka", "mutect", "freebayes" or "unknown". This information is needed to calculate the VAFs (variant allele frequencies). If variant caller is "unknown"(default) and get.vaf.function is NULL, then VAF and read depth will be NAs. If variant caller is "mutect", do not merge SBSs into DBS.

num.of.cores

The number of cores to use. Not available on Windows unless num.of.cores = 1.

names.of.VCFs

Character vector of names of the VCF files. The order of names in names.of.VCFs should match the order of VCF file paths in files. If NULL(default), this function will remove all of the path up to and including the last path separator (if any) and file paths without extensions (and the leading dot) will be used as the names of the VCF files.

tumor.col.names

Optional. Only applicable to Mutect VCFs. Character vector of column names in Mutect VCFs which contain the tumor sample information. The order of names in tumor.col.names should match the order of Mutect VCFs specified in files. If tumor.col.names is equal to NA(default), this function will use the 10th column in all the Mutect VCFs to calculate VAFs. See GetMutectVAF for more details.

filter.status

The status indicating a variant has passed all filters. An example would be "PASS". Variants which don't have the specified filter.status in the FILTER column in VCF will be removed. If NULL(default), no variants will be removed from the original VCF.

get.vaf.function

Optional. Only applicable when variant.caller is "unknown". Function to calculate VAF(variant allele frequency) and read depth information from original VCF. See GetMutectVAF as an example. If NULL(default) and variant.caller is "unknown", then VAF and read depth will be NAs.

...

Optional arguments to get.vaf.function.

max.vaf.diff

Not applicable if variant.caller = "mutect". The maximum difference of VAF, default value is 0.02. If the absolute difference of VAFs for adjacent SBSs is bigger than max.vaf.diff, then these adjacent SBSs are likely to be "merely" asynchronous single base mutations, opposed to a simultaneous doublet mutation or variants involving more than two consecutive bases.

suppress.discarded.variants.warnings

Logical. Whether to suppress warning messages showing information about the discarded variants. Default is TRUE.

Value

A list containing the following objects:

  • SBS: List of VCFs with only single base substitutions.

  • DBS: List of VCFs with only doublet base substitutions.

  • ID: List of VCFs with only small insertions and deletions.

  • discarded.variants: Non-NULL only if there are variants that were excluded from the analysis. See the added extra column discarded.reason for more details.

See Also

VCFsToCatalogs

Examples

file <- c(system.file("extdata/Mutect-vcf",
                      "Mutect.GRCh37.s1.vcf",
                      package = "ICAMS"))
list.of.vcfs <- ReadAndSplitVCFs(file, variant.caller = "mutect")

ICAMS documentation built on June 22, 2024, 6:47 p.m.