Description Usage Arguments Details Value Author(s) References See Also Examples
For hybrids with up to three parental lineages, the function estimates genomic proportions and ancestry indices by numerically searching likelihood space.
1 2 3 |
G |
A matrix or data frame of genetic marker data. Each column is a locus. For |
P |
A matrix or data frame with the following columns (order is important!): Locus name, Allele name, P1 allele frequency, P2 allele frequency, P3 allele frequency. For |
type |
A string representing the data type. The options are |
surf |
Logical: should the function find starting values by evaluating likelihoods on a grid? |
method |
Optimization method to search for maximum likelihood estimates of ancestry and heterozygosity. Alternatives are |
iterations |
The desired number of MCMC steps to perform when |
start |
A vector including the starting values of the six genomic proportions. See details. |
Cscale |
An integer, controlling the the proposal distribution for |
props |
Optional: a matrix of genomic proportions to evaluate if |
control |
A list of options to be passed to |
Given three ancestral species or parental populations (P1, P2, and P3), the genome of a hybrid can be described by six genomic proportions: p_{11} = proportion of one's genome that is homozygous for alleles inherited from P1, p_{22} = proportion of one's genome that is homozygous for alleles inherited from P2, p_{33} = proportion of one's genome that is homozygous for alleles inherited from P3, p_{12} = proportion of one's genome that is heterozygous for alleles inhertied from P1 and P2, p_{13} = proportion of one's genome that is heterozygous for alleles inhertied from P1 and P3, p_{23} = proportion of one's genome that is heterozygous for alleles inhertied from P2 and P3.
Currently, the function provides four methods for searching the likelihood space (a 5-simplex with vertices wherever each genomic proportion is equal to 1.0). method = "SANN"
is probably the best; it uses the general purpose optimization function optim
with its simulated annealing algorithm. For this estimation problem, a custom proposal function is passed to the option gr
. This proposal function draws new genomic proportions from a 6-dimensional Dirichlet distribution centered on the old genomic proportions. The concentration of the proposal distribution is controlled by Cscale
; the larger this value, the more the proposal distribution is concentrated near the current state.
method = "mcmc"
uses a Markov-Chain Monte Carlo with Metropolis-Hastings sampling to explore the likelihood space. It also uses the Dirichlet proposal distribution, and could be useful (with some modification of the code) for generating posterior distributions. method = "SANN"
is probaby superior for simply finding the MLE.
method = "L-BFGS-B"
also uses optim
, but with a quasi-Newton likelihood search algorithm to look for the maximum likelihood. This method is relatively fast, but it can miss the MLE if it is near an edge of the sample space.
"surf"
, finds all likelihoods on a 6-dimensional grid defined by props
and chooses the maximum. By default, props
is all possible combinations of 10 equally spaced proportions (from 0 to 1), subject to the constraint that they add to 1. By itself, this method is not going to find the MLE unless the MLE happens to be one of the grid points. However, using the option surf = TRUE
with the SANN
or mcmc
methods can improve efficiency by initiating the search at the grid point nearest the MLE.
A matrix with 10 named columns for each individual, containing estimated genomic proportions, ancestry indices, and the log-likelihood:
p11 |
Proportion of markers homozygous for lineage 1 alleles |
p22 |
Proportion of markers homozygous for lineage 2 alleles |
p33 |
Proportion of markers homozygous for lineage 3 alleles |
p12 |
Proportion of markers heterozygous for lineage 1 and 2 alleles |
p13 |
Proportion of markers heterozygous for lineage 1 and 3 alleles |
p23 |
Proportion of markers heterozygous for lineage 2 and 3 alleles |
S1 |
Lineage 1 ancestry index: proportion of alleles derived from parental lineage 1 |
S2 |
Lineage 2 ancestry index: proportion of alleles derived from parental lineage 2 |
S3 |
Lineage 3 ancestry index: proportion of alleles derived from parental lineage 3 |
logLik |
log-likelihood of the genomic proportions given the individual marker data |
Ben Fitzpatrick
Fitzpatrick, B. M. 2012. Estimating ancestry and heterozygosity of hybrids using molecular markers. BMC Evolutionary Biology 12:131. http://www.biomedcentral.com/1471-2148/12/131
HIC3
calculates closed-form maximum likelihood estimates for diagnostic marker data. thirdclass
and HItest3
evaluate simple classification of three-way hybrids into parental, F1, F2, and backcross categories. For conventional two-way hybrid zone analyses, see HIC, HIest, HIclass, HItest
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | ## Not run:
## all possible 2-way crosses after 2 generations
G <- rbind(
rep(1,12),rep(1,12), # parental 1
rep(2,12),rep(2,12), # parental 2
rep(3,12),rep(3,12), # parental 3
rep(1,12),rep(2,12), # 1 x 2 F1
rep(1:2,each=6),rep(1:2,6), # 1 x 2 F2
rep(1,12),rep(1:2,6), # 1 x 1 x 2 BC
rep(2,12),rep(1:2,6), # 1 x 2 x 2 BC
rep(1,12),rep(3,12), # 1 x 3 F1
rep(c(1,3),each=6),rep(c(1,3),6), # 1 x 3 F2
rep(1,12),rep(c(1,3),6), # 1 x 1 x 3 BC
rep(3,12),rep(c(1,3),6), # 1 x 3 x 3 BC
rep(2,12),rep(3,12), # 2 x 3 F1
rep(2:3,each=6),rep(2:3,6), # 2 x 3 F2
rep(3,12),rep(2:3,6), # 2 x 3 x 3 BC
rep(2,12),rep(2:3,6) # 2 x 2 x 3 BC
)
P <- data.frame(Locus=rep(1:12,each=3),allele=rep(1:3,12),
P1=rep(c(1,0,0),12),P2=rep(c(0,1,0),12),P3=rep(c(0,0,1),12))
mle.o <- threeway(G,P,surf=FALSE,iterations=99)
mle.c <- HIC3(G,P)
# compare the optimization (mle.o) to the closed-form (mle.c):
# 99 iterations is not enough to converge on the known true values.
# Try setting surf=TRUE and/or increasing iterations.
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.