TestOverdispersion: Test the Fit of Read Depth to Beta-Binomial Distribution

View source: R/overdispersion.R

TestOverdispersionR Documentation

Test the Fit of Read Depth to Beta-Binomial Distribution

Description

This function is intended to help the user select a value to pass to the overdispersion argument of AddGenotypeLikelihood, generally via pipeline functions such as IterateHWE or PipelineMapping2Parents.

Usage

TestOverdispersion(object, ...)

## S3 method for class 'RADdata'
TestOverdispersion(object, to_test = seq(6, 20, by = 2), ...)

Arguments

object

A RADdata object. Genotype calling does not need to have been performed, although for mapping populations it might be helpful to have done a preliminary run of PipelineMapping2Parents without linkage.

to_test

A vector containing values to test. These are values that will potentially be used for the overdispersion argument of a pipeline function. They should all be positive numbers.

...

Additional arguments (none implemented).

Details

If no genotype calling has been performed, a single iteration under HWE using default parameters will be done. object$ploidyChiSq is then examined to determine the most common/most likely inheritance mode for the whole dataset. The alleles that are examined are only those where this inheritance mode has the lowest chi-squared value.

Within this inheritance mode and allele set, genotypes are selected where the posterior probability of having a single copy of the allele is at least 0.95. Read depth for these genotypes is then analyzed. For each genotype, a two-tailed probability is calculated for the read depth ratio to deviate from the expected ratio by at least that much under the beta-binomial distribution. This test is performed for each overdispersion value provided in to_test.

Value

A list of the same length as to_test plus one. The names of the list are to_test converted to a character vector. Each item in the list is a vector of p-values, one per examined genotype, of the read depth ratio for that genotype to deviate that much from the expected ratio. The last item, named "optimal", is a single number indicating the optimal value for the overdispersion parameter based on the p-value distributions. If the optimal value was the minimum or maximum tested, NA is returned in the "optimal" slot to encourage the user to test other values.

Author(s)

Lindsay V. Clark

Examples

# dataset with overdispersion
data(Msi01genes)

# test several values for the overdispersion parameter
myP <- TestOverdispersion(Msi01genes, to_test = 8:10)

# view results as quantiles
sapply(myP[names(myP) != "optimal"],
       quantile, probs = c(0.01, 0.25, 0.5, 0.75, 0.99))

polyRAD documentation built on Nov. 10, 2022, 5:14 p.m.