filterVariantCalls: Filter variant calls
In timflutre/rutilstimflutre: Timothee Flutre's personal R code

filterVariantCalls

R Documentation

Filter variant calls

Description

Filter out variant calls from VCF file according to several criteria (bi-allelic, single nucleotide variant, proper amount of missing genotypes, overall depth and allele frequency).

Usage

filterVariantCalls(
  vcf.file,
  genome = "",
  out.file,
  yieldSize = NA_integer_,
  dict.file = NULL,
  seq.id = NULL,
  seq.start = NULL,
  seq.end = NULL,
  variants.tokeep = NULL,
  is.snv = NULL,
  is.biall = NULL,
  min.var.dp = NULL,
  max.var.dp = NULL,
  min.alt.af = NULL,
  max.alt.af = NULL,
  min.spl.dp = NULL,
  min.perc.spl.dp = NULL,
  min.spl.gq = NULL,
  min.perc.spl.gq = NULL,
  max.var.nb.gt.na = NULL,
  max.var.perc.gt.na = NULL,
  verbose = 1
)

Arguments

`vcf.file`	path to the VCF file (if the bgzip index doesn't exist in the same directory, it will be created)
`genome`	genome identifier (e.g. "VITVI_12x2")
`out.file`	path to the output VCF file (a bgzip index will be created in the same directory)
`yieldSize`	number of records to yield each time the file is read from (see ?TabixFile) if seq.id is NULL
`dict.file`	path to the SAM dict file (see https://broadinstitute.github.io/picard/command-line-overview.html#CreateSequenceDictionary) if seq.id is specified with no start/end
`seq.id`	sequence identifier to work on (e.g. `"chr2"`)
`seq.start`	start of the sequence to work on (if NULL, whole seq)
`seq.end`	end of the sequence to work on (if NULL, whole seq)
`variants.tokeep`	character vector of variant names to keep (e.g. `c("chr1:35718_C/A","chr1:61125_A/G")`)
`is.snv`	if not NULL but TRUE, filter out the variants which are not SNVs
`is.biall`	if not NULL but TRUE, filter out the variants with more than one alternative allele
`min.var.dp`	minimum variant-level DP below which variants are filtered out
`max.var.dp`	maximum variant-level DP above which variants are filtered out
`min.alt.af`	minimum variant-level AF below which variants are filtered out
`max.alt.af`	maximum variant-level AF above which variants are filtered out
`min.spl.dp`	minimum sample-level DP
`min.perc.spl.dp`	minimum percentage of samples with DP above threshold
`min.spl.gq`	minimum sample-level GQ
`min.perc.spl.gq`	minimum percentage of samples with GQ above threshold
`max.var.nb.gt.na`	maximum number of samples with missing GT
`max.var.perc.gt.na`	maximum percentage of samples with missing GT
`verbose`	verbosity level (0/1)