Description Usage Arguments Details Value Author(s) See Also Examples
Filter Variant Call Format (VCF) files from one file to another
1 2 3 4 5 6 7 8 9 | ## S4 method for signature 'character'
filterVcf(file, genome, destination, ..., verbose = TRUE,
index = FALSE, prefilters = FilterRules(), filters = FilterRules(),
param = ScanVcfParam())
## S4 method for signature 'TabixFile'
filterVcf(file, genome, destination, ..., verbose = TRUE,
index = FALSE, prefilters = FilterRules(), filters = FilterRules(),
param = ScanVcfParam())
|
file |
A |
genome |
A |
destination |
A |
... |
Additional arguments, possibly used by future methods. |
verbose |
A |
index |
A |
prefilters |
A |
filters |
A |
param |
A |
This function transfers content of one VCF file to another, removing
records that fail to satisfy prefilters
and
filters
. Filtering is done in a memory efficient manner,
iterating over the input VCF file in chunks of default size 100,000
(when invoked with character(1)
for file
) or as
specified by the yieldSize
argument of TabixFile
(when
invoked with TabixFile
).
There are up to two passes. In the first pass, unparsed lines are
passed to prefilters
for filtering, e.g., searching for a fixed
character string. In the second pass lines successfully passing
prefilters
are parsed into VCF
instances and made
available for further filtering. One or both of prefilter
and
filter
can be present.
Filtering works by removing the rows (variants) that do not meet a criteria. Because this is a row-based approach and samples are column-based most genotype filters are only meaningful for single-sample files. If a single samples fails the criteria the entire row (all samples) are removed. The case where genotype filtering is effective for multiple samples is when the criteria is applied across samples and not to the individual (e.g., keep rows where all samples have DP > 10).
The destination file path as a character(1)
.
Martin Morgan and Paul Shannon
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
## -----------------------------------------------------------------------
## Filter for SNVs in a defined set of ranges:
## -----------------------------------------------------------------------
if (require(TxDb.Hsapiens.UCSC.hg19.knownGene)) {
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
exons <- exons(txdb)
exons22 <- exons[seqnames(exons) == "chr22"]
seqlevelsStyle(exons22) <- "NCBI" ## match chrom names in VCF file
## Range-based filter:
withinRange <- function(rng)
function(x) x
## The first filter identifies SNVs and the second applies the
## range restriction.
filters <- FilterRules(list(
isSNV = isSNV,
withinRange = withinRange(exons22)))
## Apply
## Not run:
filt1 <- filterVcf(fl, "hg19", tempfile(), filters=filters, verbose=TRUE)
## End(Not run)
}
## -----------------------------------------------------------------------
## Using a pre-filter and filter:
## -----------------------------------------------------------------------
## Low coverage exome snp filter:
lowCoverageExomeSNP = function(x) grepl("LOWCOV,EXOME", x, fixed=TRUE)
## The pre-filter identifies low coverage exome snps and the filter
## identifies variants with INFO variable VT = SNP.
pre <- FilterRules(list(lowCoverageExomeSNP = lowCoverageExomeSNP))
filt <- FilterRules(list(VTisSNP = function(x) info(x)$VT == "SNP"))
## Apply
filt2 <- filterVcf(fl, "hg19", tempfile(), prefilters=pre, filters=filt)
## Filtered results
vcf <- readVcf(filt2, "hg19")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.