HindHe | R Documentation |
HindHe
and HindHeMapping
both generate a matrix of values, with
taxa in rows and loci in columns. The mean value of the matrix is expected to
be a certain value depending on the ploidy and, in the case of natural
populations and diversity panels, the inbreeding coefficient. colMeans
of the matrix can be used to filter non-Mendelian loci from the dataset.
rowMeans
of the matrix can be used to identify taxa that are not the
expected ploidy, are interspecific hybrids, or are a mix of multiple samples.
HindHe(object, ...) ## S3 method for class 'RADdata' HindHe(object, omitTaxa = GetBlankTaxa(object), ...) HindHeMapping(object, ...) ## S3 method for class 'RADdata' HindHeMapping(object, n.gen.backcrossing = 0, n.gen.intermating = 0, n.gen.selfing = 0, ploidy = object$possiblePloidies[[1]], minLikelihoodRatio = 10, omitTaxa = c(GetDonorParent(object), GetRecurrentParent(object), GetBlankTaxa(object)), ...)
object |
A |
omitTaxa |
A character vector indicating names of taxa not to be included in the output.
For |
n.gen.backcrossing |
The number of generations of backcrossing performed in a mapping population. |
n.gen.intermating |
The number of generations of intermating performed in a mapping population.
Included for consistency with |
n.gen.selfing |
The number of generations of self-fertilization performed in a mapping population. |
ploidy |
A single value indicating the assumed ploidy to test. Currently, only autopolyploid and diploid inheritance modes are supported. |
minLikelihoodRatio |
Used internally by |
... |
Additional arguments (none implemented). |
These functions are especially useful for highly duplicated genomes, in which RAD tag alignments may have been incorrect, resulting in groups of alleles that do not represent true Mendelian loci. The statistic that is calculated is based on the principle that observed heterozygosity will be higher than expected heterozygosity if a "locus" actually represents two or more collapsed paralogs. However, the statistic uses read depth in place of genotypes, eliminating the need to perform genotype calling before filtering.
For a given taxon * locus, Hind is the probability that two sequencing reads, sampled without replacement, are different alleles (RAD tags).
In HindHe
, He is the expected heterozygosity, estimated from
allele frequencies by taking the column means of object$depthRatios
.
This is also the estimated probability that if two alleles were sampled at
random from the population at a given locus, they would be different alleles.
In HindHeMapping
, He is the average probability that in
a random progeny, two alleles sampled without replacement would be different.
The number of generations of backcrossing and self-fertilization, along with the
ploidy and estimated parental genotypes, are needed to make this calculation.
The function essentially simulates the mapping population based on parental
genotypes to determine He.
The expectation is that
Hind/He = (ploidy - 1)/ploidy * (1 - F)
in a diversity panel, where F is the inbreeding coefficient, and
Hind/He = (ploidy - 1)/ploidy
in a mapping population. Loci that have much higher average values likely represent collapsed paralogs that should be removed from the dataset. Taxa with much higher average values may be higher ploidy than expected, interspecific hybrids, or multiple samples mixed together.
A named matrix, with taxa in rows and loci in columns. For HindHeMapping
,
loci are omitted if consistent parental genotypes could not be determined across
alleles.
Lindsay V. Clark
Clark, L. V., Mays, W., Lipka, A. E. and Sacks, E. J. (2022) A population-level statistic for assessing Mendelian behavior of genotyping-by-sequencing data from highly duplicated genomes. BMC Bioinformatics 23, 101, doi:10.1186/s12859-022-04635-9.
A seminar describing Hind/He is available at https://youtu.be/Z2xwLQYc8OA?t=1678.
InbreedingFromHindHe
,
ExpectedHindHe
data(exampleRAD) hhmat <- HindHe(exampleRAD) colMeans(hhmat, na.rm = TRUE) # near 0.5 for diploid loci, 0.75 for tetraploid loci data(exampleRAD_mapping) exampleRAD_mapping <- SetDonorParent(exampleRAD_mapping, "parent1") exampleRAD_mapping <- SetRecurrentParent(exampleRAD_mapping, "parent2") hhmat2 <- HindHeMapping(exampleRAD_mapping, n.gen.backcrossing = 1) colMeans(hhmat2, na.rm = TRUE) # near 0.5; all loci diploid
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.