View source: R/DAPC_adegenet.R
vcf_getSNP | R Documentation |
Get Best SNP for each locus from VCF
From a VCF with multiple sites/locus, create a VCF with only the best site/locus. The best site is the one with the least missing data. To break ties, take the first site among the best sites. Note: I may update this to not require vcftools.
vcf_getSNP(
vcf,
out,
vcftools.path = NULL,
indv.keep = NULL,
which.site = "best",
min.indv = 2,
max.fMD = 1,
min.n0 = 2,
min.n1 = 1
)
vcf |
Character string with path to input vcf. |
out |
Character string where to write output vcf. |
vcftools.path |
Character string with path to the vcftools executable. Default NULL, in which case vcftools.path is determined from table returned by config_miscwrappers(). See 'config_miscwrappers' function. |
indv.keep |
Character string with names of individuals to keep. Default is NULL (all individuals kept). |
which.site |
Character string indicating the method for choosing a site to keep for each locus (or chromosome). Default = "best", which is considered the one with the least missing data, or the first among sites tied for least missing data. Other options are "all.passing", which retains all sites (positions) that pass variation filters (min.n, min.0.n.0, min.1.n), "first" (first site kept at each locus), or "random". |
min.indv |
Integer >= 1 specifying the minimum number of individuals with data required to keep a site. Default = 2. If set to "all", only complete-data sites kept. |
max.fMD |
Number in the range (0,1) specifying the maximum fraction of missing alleles at a site. Default = 1. |
min.n0 |
Minimum number of individuals required to have at least one copy of the major allele to keep a site. Default = 2. |
min.n1 |
Minimum number of individuals required to have at least one copy of the minor allele to keep a site. Default = 1. |
min.n |
Integer >= 1 specifying the minimum number of non-missing alleles required to keep a site. Default = 3; often, 4 is preferred (for use with quartet based analyses). If set to "all", only complete-data sites kept. |
List with [1] path to vcftools, [2] dataframe with input and output values for VCF filepaths, number of loci (chromosomes), sites (positions), and individuals (samples).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.