MakeTasselVcfFilter: Filter Lines of a VCF File By Call Rate and Allele Frequency

View source: R/data_import.R

MakeTasselVcfFilterR Documentation

Filter Lines of a VCF File By Call Rate and Allele Frequency

Description

This function creates another function that can be used as a prefilter by the function filterVcf in the package VariantAnnotation. The user can set a minimum number of indiviuals with reads and a minimum number of individuals with the minor allele (either the alternative or reference allele). The filter can be used to generate a smaller VCF file before reading with VCF2RADdata.

Usage

MakeTasselVcfFilter(min.ind.with.reads = 200, min.ind.with.minor.allele = 10)

Arguments

min.ind.with.reads

An integer indicating the minimum number of individuals that must have reads in order for a marker to be retained.

min.ind.with.minor.allele

An integer indicating the minimum number of individuals that must have the minor allele in order for a marker to be retained.

Details

This function assumes the VCF file was output by the TASSEL GBSv2 pipeline. This means that each genotype field begins with two digits ranging from zero to three separated by a forward slash to indicate the called genotype, followed by a colon.

Value

A function is returned. The function takes as its only argument a character vector representing a set of lines from a VCF file, with each line representing one SNP. The function returns a logical vector the same length as the character vector, with TRUE if the SNP meets the threshold for call rate and minor allele frequency, and FALSE if it does not.

Author(s)

Lindsay V. Clark

References

https://bitbucket.org/tasseladmin/tassel-5-source/wiki/Tassel5GBSv2Pipeline

Examples

# make the filtering function
filterfun <- MakeTasselVcfFilter(300, 15)


# Executable code excluded from CRAN testing for taking >10 s:

require(VariantAnnotation)
# get the example VCF installed with polyRAD
exampleVCF <- system.file("extdata", "Msi01genes.vcf", package = "polyRAD")
exampleBGZ <- paste(exampleVCF, "bgz", sep = ".")

# zip and index the file using Tabix (if not done already)
if(!file.exists(exampleBGZ)){
  exampleBGZ <- bgzip(exampleVCF)
  indexTabix(exampleBGZ, format = "vcf")
}

# make a temporary file
# (for package checks; you don't need to do this in your own code)
outfile <- tempfile(fileext = ".vcf")

# filter to a new file
filterVcf(exampleBGZ, destination = outfile, 
          prefilters = FilterRules(list(filterfun)))


polyRAD documentation built on Nov. 10, 2022, 5:14 p.m.