vcf_getSNP: Get Best SNP for each locus from VCF From a VCF with multiple...

View source: R/DAPC_adegenet.R

vcf_getSNPR Documentation

Get Best SNP for each locus from VCF From a VCF with multiple sites/locus, create a VCF with only the best site/locus. The best site is the one with the least missing data. To break ties, take the first site among the best sites. Note: I may update this to not require vcftools.

Description

Get Best SNP for each locus from VCF

From a VCF with multiple sites/locus, create a VCF with only the best site/locus. The best site is the one with the least missing data. To break ties, take the first site among the best sites. Note: I may update this to not require vcftools.

Usage

vcf_getSNP(
  vcf,
  out,
  vcftools.path = NULL,
  indv.keep = NULL,
  which.site = "best",
  min.indv = 2,
  max.fMD = 1,
  min.n0 = 2,
  min.n1 = 1
)

Arguments

vcf

Character string with path to input vcf.

out

Character string where to write output vcf.

vcftools.path

Character string with path to the vcftools executable. Default NULL, in which case vcftools.path is determined from table returned by config_miscwrappers(). See 'config_miscwrappers' function.

indv.keep

Character string with names of individuals to keep. Default is NULL (all individuals kept).

which.site

Character string indicating the method for choosing a site to keep for each locus (or chromosome). Default = "best", which is considered the one with the least missing data, or the first among sites tied for least missing data. Other options are "all.passing", which retains all sites (positions) that pass variation filters (min.n, min.0.n.0, min.1.n), "first" (first site kept at each locus), or "random".

min.indv

Integer >= 1 specifying the minimum number of individuals with data required to keep a site. Default = 2. If set to "all", only complete-data sites kept.

max.fMD

Number in the range (0,1) specifying the maximum fraction of missing alleles at a site. Default = 1.

min.n0

Minimum number of individuals required to have at least one copy of the major allele to keep a site. Default = 2.

min.n1

Minimum number of individuals required to have at least one copy of the minor allele to keep a site. Default = 1.

min.n

Integer >= 1 specifying the minimum number of non-missing alleles required to keep a site. Default = 3; often, 4 is preferred (for use with quartet based analyses). If set to "all", only complete-data sites kept.

Value

List with [1] path to vcftools, [2] dataframe with input and output values for VCF filepaths, number of loci (chromosomes), sites (positions), and individuals (samples).


JeffWeinell/misc.wrappers documentation built on Sept. 20, 2023, 12:42 p.m.