Description Usage Arguments Details Value References
Calculate gene frequencies and observed and expected heterozygosites for locus.
These routines only work on a genotype file (e.g. as written by write.snps) with first column as ID column,
and subsequent columns are genotypes coded as 0, 1, or 2.
If population is given, all calculations are performed on each population separately.
1 | heterozygosity(fn, population = NULL, ncol = NULL, nlines = NULL)
|
fn |
Filename of genotype matrix (0,1,2) with first column denoting ID. |
population |
Vector of same length as rows in |
ncol |
Integer, number of SNP columns in |
nlines |
Integer, number of lines in |
The gene frequencies, p and q, refers to the gene frequencies of genotypes 0 and 2, respectively.
They are calculated as such, for each column:
p = (2 * # homozygotes + # heterozygotes) / (2 * # rows)
q = 1 - p
Observed (Hobs) and expected (Hexp) heterozygosity are calculated as
Hobs = #1/n
Hexp = 2*p*q
Missing values:
In the above, NA elements are ignored and thus do not count toward either genotype nor number of rows.
Genotypes with values smaller than 0 or greater than 2 are considered missing.
Finally, an inbreeding coeffiecent would be calculated as:
F = (Hexp - Hobs)/Hexp
For which Fst estimator to use, and how to combine across multiple snps, see Bhatia et al. (2013).
Data frame with columns
populationPopulation, as specified by argument population.
pVector of allele frequencies of alleles coded as '0'.
HobsVector of observed heterozygosity (Hobs).
HexpVector of expected heterozygosity (Hexp).
nVector of observed genotypes for each locus (ignoring NA-values). Alleles are twice this number.
Equations based on http://www.uwyo.edu/dbmcd/molmark/practica/fst.html.
Bhatia, Patterson, Sankararaman, and Price. Estimating and interpreting FST: The impact of rare variants. Genome Research (2013) 23: 1514-1521. Preprint doi: 10.1101/gr.154831.113.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.