View source: R/erase_genotypes.R
erase_genotypes | R Documentation |
This function uses the information in the vcf tidy to i) make a blacklist of individual genotypes to erase based on coverage and genotype likelihood thresholds and ii) erase those blacklisted genotypes from a tidy vcf file and a STACKS haplotypes file.
erase_genotypes(
tidy.vcf.file,
haplotypes.file,
read.depth.threshold,
allele.depth.threshold,
allele.imbalance.threshold,
filename
)
tidy.vcf.file |
A data frame object or file (using ".tsv") of a tidy vcf. |
haplotypes.file |
(optional) The 'batch_x.haplotypes.tsv'. If you want to erase the genotypes that don't pass the threshold. |
read.depth.threshold |
(integer) Threshold for the read depth. |
allele.depth.threshold |
(integer) Threhold for the min depth of REF or ALT alleles. |
allele.imbalance.threshold |
(numeric) Threshold of ratio between ALT and REF depth of coverage. See details below. |
filename |
(optional) Name of the file written to the working directory. |
Genotypes below average quality i.e. below threshold for the coverage of REF and/or ALT allele and genotype likelihood are zeroed from the file. The function erase SNP in the VCF file and loci in the haplotypes file. Also creates a blacklist of genotypes erased based on the genotype likelihood threshold, the REF and ALT allele coverage threshold. The ratio is calculated : (read depth ALT allele - read depth REF allele)/(read depth ALT allele + read depth REF allele). e.g. REF = 3 and ALT = 2 the ratio = -0.2. For the function to work properly, use positive values, the function will calculate the +/- imbalance.
The function returns the blacklisted individuals genotypes,
by loci, position (SNP, POS in stacks), populations and individuals.
For VCF, return the tidy vcf in the global environment only and in
the directory with filename
. For haplotype file the original
filename with "_erased_geno" is appended.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.