HItest: Compare the likelihood of hybrid classification to MLE...
In HIest: Hybrid index estimation

Description Usage Arguments Details Value Author(s) References See Also Examples

HItest compares the best fit of six early generation diploid hybrid genotypes (parental, F1, F2, backcross) to the maximum likelihood genotype decribed by ancestry (S) and interclass heterozygosity (H).

1	HItest(class, MLE, thresholds = c(2, 1))

`class`	Output from `HIclass`: a data frame summarizing the fit of each individual to the six genotype classes.
`MLE`	Output from `HIest`: a data frame giving the MLE S and H and associated log-likelihood.
`thresholds`	Criteria for classification. The first criterion (`thresholds[1]`) is a cutoff for the difference in log-likelihood for the best vs. second best genotype class. The second criterion (`thresholds[2]`) is a cutoff for the difference in log-likelihood for the best genotype class vs. the MLE.

As a quick-and-dirty rule of thumb, one might accept a putative classification as credible if the log-likelihood of the best-fit class was over 2 units greater than the log-likelihood of the second best-fit class AND within 2 units of the maximum log-likelihood. The first criterion is based on the approximate equivalence of a 2 x log-likelihood interval to a 95 percent confidence interval for some distributions (Hudson 1971; Hillborn and Mangel 1997). The second is based on the conventional penalty of two log-likelihood units for an additional estimated parameter in model selection (Edwards 1972; Burnaham and Anderson 2004). The classification model can be viewed as having one free parameter (once the best-fit class is set to "chosen", the other five are constrained to "not chosen"), while the continuous model has two (S and H). This approach has the disadvantage of effectively treating the classification as a null model, which is not biologically justified.

A better approach might be to accept the classification only if its AIC is lower than the AIC of the MLE, i.e., if dAIC is negative (Fitzpatrick 2012). Note that dAIC cannot be less than -2 (the case where the MLE is identical to the expectation for a class).

A data frame with one row per individual. Columns are:

`S`	The maximim likelihood estimate of the ancestry index from `HIest`.
`H`	The maximum likelihood estimate of the interclass heterozygosity from `HIest`.
`Best.class`	The class with the highest likelihood of the six from `HIclass`.
`LL.class`	The log-likelihood of the data for the best-fit class from `HIclass`.
`LLD.class`	The difference in log-likelihood between the best and second-best fit class from `HIclass`.
`LL.max`	The maximum log-likelihood from `HIest`.
`dAIC`	The difference in AIC between the continuous model MLE (2 estimated parameters) and the best-fit class (1 estimated parameter).
`c1`	Logical: `TRUE` if the best-fit class is supported by more than `thresholds[1]` log-likelihood units over the second best.
`c2`	Logical: `TRUE` if the best-fit class is WITHIN `thresholds[2]` log-likelihood units of the MLE.

Ben Fitzpatrick

Burnham, K. P., and D. R. Anderson. 2004. Multimodel inference: understanding AIC and BIC in model selection. Sociological Methods and Research 33:261-304.

Edwards, A. W. F. 1972. Likelihood. Cambridge University Press, Cambridge.

Fitzpatrick, B. M. 2008. Hybrid dysfunction: Population genetic and quantitative genetic perspectives. American Naturalist 171:491-198.

Fitzpatrick, B. M. 2012. Estimating ancestry and heterozygosity of hybrids using molecular markers. BMC Evolutionary Biology 12:131. http://www.biomedcentral.com/1471-2148/12/131

Hilborn, R., and M. Mangel. 1997. The ecological detective: Confronting models with data. Princeton University Press, New Jersey.

Hudson, D. J. 1971. Interval estimation from the likelihood function. Journal of the Royal Statistical Society, Series B 33: 256-262.

Lynch, M. 1991. The genetic interpretation of inbreeding depression and outbreeding depression. Evolution 45:622-629.

HIest for maximum likelihood estimation of S and H, HIsurf for a likelihood surface, HIclass for likelihoods of early generation hybrid classes, HILL for the basic likelihood function.

	## Not run: 
data(Bluestone)
Bluestone <- replace(Bluestone,is.na(Bluestone),-9)
# parental allele frequencies (assumed diagnostic)
BS.P <- data.frame(Locus=names(Bluestone),Allele="BTS",P1=1,P2=0)

# estimate ancestry and heterozygosity
BS.est <-HIC(Bluestone)

# calculate likelihoods for early generation hybrid classes
BS.class <- HIclass(Bluestone,BS.P,type="allele.count")

# compare classification with maximum likelihood estimates
BS.test <- HItest(BS.class,BS.est)

table(BS.test$c1)
# all 41 are TRUE, meaning the best classification is at least 2 log-likelihood units
# better than the next best

table(BS.test$c2)
# 2 are TRUE, meaning the MLE S and H are within 2 log-likelihood units of the best
# classification, i.e., the simple classification is rejected in all but 2 cases.

table(BS.test$Best.class,BS.test$c2)
# individuals were classified as F2-like (class 3) or backcross to CTS (class 4), but
# only two of the F2's were credible 

BS.test[BS.test$c2,]
# in only one case was the F2 classification a better fit (based on AIC) than the
# continuous model.

## End(Not run)

Loading required package: nnet
 * * * * * * * * *
 10 * * * * * * * * *
 20 * * * * * * * * *
 30 * * * * * * * * *
 40 *
TRUE 
  41 

FALSE  TRUE 
   40     1 
          
           FALSE TRUE
  class011     3    0
  class121    37    1
           S         H Best.class  LL.class LLD.class    LL.max       dAIC   c1
27 0.4758065 0.4354839   class121 -67.23528  3900.188 -66.58898 -0.7073996 TRUE
     c2
27 TRUE