quartetTreeTest: Hypothesis test for quartet counts fitting a tree under the...

quartetTreeTestR Documentation

Hypothesis test for quartet counts fitting a tree under the MSC

Description

Test the hypothesis H_0= T1 or T3 model of \insertCiteMAR19;textualMSCquartets, vs. H_1 = everything else. T1 is for a specific species quartet topology, and T3 for any species quartet topology.

Usage

quartetTreeTest(
  obs,
  model = "T3",
  lambda = 0,
  method = "MLest",
  smallsample = "precomputed",
  smallcounts = "precomputed",
  bootstraps = 10^4
)

Arguments

obs

vector of 3 counts of resolved quartet frequencies

model

"T1" or "T3", for the models of \insertCiteMAR19;textualMSCquartets

lambda

parameter for power-divergence statistic (e.g., 0 for likelihood ratio statistic, 1 for Chi-squared statistic)

method

"MLtest","conservative", or "bootstrap"

smallsample

"precomputed" or "bootstrap", method of obtaining p-value when sample is small (<30)

smallcounts

"precomputed" or "bootstrap", method of obtaining p-value when some (but not all) counts are small

bootstraps

number of samples for bootstrapping

Details

This function implements two of the versions of the test given by \insertCiteMAR19;textualMSCquartets as well as parametric bootstrapping, with other procedures for when some expected counts are small. When the topology and/or the internal quartet branch length is not specified by the null hypothesis these are more accurate tests than, say, a Chi-square with one degree of freedom, which is not theoretically justified near the singularities and boundaries of the models.

If method="MLtest", this uses the test by that name described in Section 7 of \insertCiteMAR19;textualMSCquartets. For both the T1 and T3 models the test is slightly anticonservative over a small range of true internal edges of the quartet species tree. Although the test generally performs well in practice, it lacks a uniform asymptotic guarantee over the full parameter space for either T1 or T3.

If method="conservative", a conservative test described by \insertCiteMAR19;textualMSCquartets is used. For model T3 this uses the Chi-square distribution with 1 degree of freedom (the "least favorable" approach), while for model T1 it uses the Minimum Adjusted Bonferroni, based on precomputed values from simulations with n=1e+6. These conservative tests are asymptotically guaranteed to reject the null hypothesis at most at a specified level, but at the expense of increased type II errors.

If method="bootstrap", then parametric bootstrapping is performed, based on parameter estimates of the quartet topology and internal edge length. The bootstrap sample size is given by the bootstrap argument.

When some or all expected topology counts are small, the methods "MLest" and "conservative" are not appropriate. The argument smallsample determines whether a precomputed bootstrap of 1e+8 samples, or actual boostrapping with the specified size, is used when the total sample is small (<30). The argument smallcounts determines whether bootstrapping or a faster approximate method is used when only some counts are small. The approximate approach returns a precomputed p-value, found by replacing the largest observed count with 1e+6 and performing 1e+8 bootstraps for the model T3. When n >30 and some expected counts are small, the quartet tree error probability is small and the bootstrap p-value is approximately independent of the choice of T3 or T1 and of the largest observed count.

For model T1, the first entry of obs is treated as the count of gene quartets concordant with the species tree.

The returned p-value should be taken with caution when there is a small sample size, e.g. less than 30 gene trees. The returned value of $edgelength is a consistent estimator, but not the MLE, of the internal edge length in coalescent units. Although consistent, the MLE for this length is biased. Our consistent estimator is still biased, but with less bias than the MLE. See \insertCiteMAR19;textualMSCquartets for more discussion on dealing with the bias of parameter estimates in the presence of boundaries and/or singularities of parameter spaces.

Value

output where output$p.value is the p-value and output$edgelength is a consistent estimator of the internal edge length in coalescent units, possibly Inf.

References

\insertRef

MAR19MSCquartets

See Also

quartetTreeTestInd

Examples

 quartetTreeTest(c(17,72,11),"T3")
 quartetTreeTest(c(17,72,11),"T1")
 quartetTreeTest(c(72,11,17),"T1")
 quartetTreeTest(c(11,17,72),"T1")


MSCquartets documentation built on Oct. 31, 2024, 1:08 a.m.