filterVcfBasic: Basic VCF filter function

View source: R/filterVcf.R

filterVcfBasicR Documentation

Basic VCF filter function

Description

Function to remove artifacts and low confidence/quality variant calls.

Usage

filterVcfBasic(
  vcf,
  tumor.id.in.vcf = NULL,
  use.somatic.status = TRUE,
  snp.blacklist = NULL,
  af.range = c(0.03, 0.97),
  contamination.range = c(0.01, 0.075),
  min.coverage = 15,
  min.base.quality = 25,
  max.base.quality = 50,
  base.quality.offset = 1,
  min.supporting.reads = NULL,
  error = 0.001,
  target.granges = NULL,
  remove.off.target.snvs = TRUE,
  model.homozygous = FALSE,
  interval.padding = 50,
  DB.info.flag = "DB"
)

Arguments

vcf

CollapsedVCF object, read in with the readVcf function from the VariantAnnotation package.

tumor.id.in.vcf

The tumor id in the CollapsedVCF (optional).

use.somatic.status

If somatic status and germline data is available, then use this information to remove non-heterozygous germline SNPs or germline SNPS with biased allelic fractions.

snp.blacklist

A file with blacklisted genomic regions. Must be parsable by import from rtracklayer, for a example a BED file with file extension ‘.bed’.

af.range

Exclude variants with allelic fraction smaller or greater than the two values, respectively. The higher value removes homozygous SNPs, which potentially have allelic fractions smaller than 1 due to artifacts or contamination. If a matched normal is available, this value is ignored, because homozygosity can be confirmed in the normal.

contamination.range

Count variants in germline databases with allelic fraction in the specified range. If the number of these putative contamination variants exceeds an expected value and if they are found on almost all chromosomes, the sample is flagged as potentially contaminated and extra contamination estimation steps will be performed later on.

min.coverage

Minimum coverage in tumor. Variants with lower coverage are ignored.

min.base.quality

Minimium base quality in tumor. Requires a BQ genotype field in the VCF. Values below this value will be ignored.

max.base.quality

Maximum base quality in tumor. Requires a BQ genotype field in the VCF. Variants exceeding this value will have their BQ capped at this value.

base.quality.offset

Subtracts the specified value from the base quality score. Useful to add some cushion for too optimistically calibrated scores. Requires a BQ genotype field in the VCF.

min.supporting.reads

Minimum number of reads supporting the alt allele. If NULL, calculate based on coverage and assuming sequencing error of 10^-3.

error

Estimated sequencing error rate. Used to calculate minimum number of supporting reads using calculatePowerDetectSomatic when base quality scores are not available.

target.granges

GenomicRanges object specifiying the target postions. Used to remove off-target reads. If NULL, do not check whether variants are on or off-target.

remove.off.target.snvs

If set to a true value, will remove all SNVs outside the covered regions.

model.homozygous

If set to TRUE, does not remove homozygous variants. Ignored in case a matched normal is provided in the VCF.

interval.padding

Include variants in the interval flanking regions of the specified size in bp. Requires target.granges.

DB.info.flag

Flag in INFO of VCF that marks presence in common germline databases. Defaults to DB that may contain somatic variants if it is from an unfiltered germline database.

Value

A list with elements

vcf

The filtered CollapsedVCF object.

flag

A flag (logical(1)) if problems were identified.

flag_comment

A comment describing the flagging.

Author(s)

Markus Riester

See Also

calculatePowerDetectSomatic

Examples


# This function is typically only called by runAbsolute via
# fun.filterVcf and args.filterVcf.
vcf.file <- system.file("extdata", "example.vcf.gz", package="PureCN")
vcf <- readVcf(vcf.file, "hg19")
vcf.filtered <- filterVcfBasic(vcf)


lima1/PureCN documentation built on Nov. 22, 2024, 6:07 a.m.