heterozygosity: Calculates hetereozygosity stats for a single population
In stefanedwards/Siccuracy: Pipeline Package for AlphaImpute

Description Usage Arguments Details Value References

Calculate gene frequencies and observed and expected heterozygosites for locus. These routines only work on a genotype file (e.g. as written by write.snps) with first column as ID column, and subsequent columns are genotypes coded as 0, 1, or 2. If population is given, all calculations are performed on each population separately.

1	heterozygosity(fn, population = NULL, ncol = NULL, nlines = NULL)

`fn`	Filename of genotype matrix (0,1,2) with first column denoting ID.
`population`	Vector of same length as rows in `fn`; defaults to `1`, coerced from factor to integer.
`ncol`	Integer, number of SNP columns in `fn` When `NULL`, automagically detected with `get_ncols(fn)-1`.
`nlines`	Integer, number of lines in `fn`. When `NULL`, automagically detected with `gen_nlines(fn)`.

The gene frequencies, p and q, refers to the gene frequencies of genotypes 0 and 2, respectively. They are calculated as such, for each column:

p = (2 * # homozygotes + # heterozygotes) / (2 * # rows)

q = 1 - p

Observed (Hobs) and expected (Hexp) heterozygosity are calculated as

Hobs = #1/n

Hexp = 2*p*q

Missing values: In the above, NA elements are ignored and thus do not count toward either genotype nor number of rows. Genotypes with values smaller than 0 or greater than 2 are considered missing.

Finally, an inbreeding coeffiecent would be calculated as:

F = (Hexp - Hobs)/Hexp

For which Fst estimator to use, and how to combine across multiple snps, see Bhatia et al. (2013).

Data frame with columns

population: Population, as specified by argument population.
p: Vector of allele frequencies of alleles coded as '0'.
Hobs: Vector of observed heterozygosity (Hobs).
Hexp: Vector of expected heterozygosity (Hexp).
n: Vector of observed genotypes for each locus (ignoring NA-values). Alleles are twice this number.

Equations based on http://www.uwyo.edu/dbmcd/molmark/practica/fst.html.

Bhatia, Patterson, Sankararaman, and Price. Estimating and interpreting FST: The impact of rare variants. Genome Research (2013) 23: 1514-1521. Preprint doi: 10.1101/gr.154831.113.

stefanedwards/Siccuracy documentation built on May 30, 2019, 10:44 a.m.