heterozygosity: Calculates hetereozygosity stats for a single population

Description Usage Arguments Details Value References

Description

Calculate gene frequencies and observed and expected heterozygosites for locus. These routines only work on a genotype file (e.g. as written by write.snps) with first column as ID column, and subsequent columns are genotypes coded as 0, 1, or 2. If population is given, all calculations are performed on each population separately.

Usage

1
heterozygosity(fn, population = NULL, ncol = NULL, nlines = NULL)

Arguments

fn

Filename of genotype matrix (0,1,2) with first column denoting ID.

population

Vector of same length as rows in fn; defaults to 1, coerced from factor to integer.

ncol

Integer, number of SNP columns in fn When NULL, automagically detected with get_ncols(fn)-1.

nlines

Integer, number of lines in fn. When NULL, automagically detected with gen_nlines(fn).

Details

The gene frequencies, p and q, refers to the gene frequencies of genotypes 0 and 2, respectively. They are calculated as such, for each column:

p = (2 * # homozygotes + # heterozygotes) / (2 * # rows)

q = 1 - p

Observed (Hobs) and expected (Hexp) heterozygosity are calculated as

Hobs = #1/n

Hexp = 2*p*q

Missing values: In the above, NA elements are ignored and thus do not count toward either genotype nor number of rows. Genotypes with values smaller than 0 or greater than 2 are considered missing.

Finally, an inbreeding coeffiecent would be calculated as:

F = (Hexp - Hobs)/Hexp

For which Fst estimator to use, and how to combine across multiple snps, see Bhatia et al. (2013).

Value

Data frame with columns

population

Population, as specified by argument population.

p

Vector of allele frequencies of alleles coded as '0'.

Hobs

Vector of observed heterozygosity (Hobs).

Hexp

Vector of expected heterozygosity (Hexp).

n

Vector of observed genotypes for each locus (ignoring NA-values). Alleles are twice this number.

References

Equations based on http://www.uwyo.edu/dbmcd/molmark/practica/fst.html.

Bhatia, Patterson, Sankararaman, and Price. Estimating and interpreting FST: The impact of rare variants. Genome Research (2013) 23: 1514-1521. Preprint doi: 10.1101/gr.154831.113.


stefanedwards/Siccuracy documentation built on May 30, 2019, 10:44 a.m.