diseq: Estimate or Compute Confidence Interval for the Single-Marker...
In genetics: Population Genetics

Description Usage Arguments Details Value Author(s) See Also Examples

Estimate or compute confidence interval for single-marker disequilibrium.

diseq(x, ...)
## S3 method for class 'diseq'
print(x, show=c("D","D'","r","R^2","table"), ...)
diseq.ci(x, R=1000, conf=0.95, correct=TRUE, na.rm=TRUE, ...)

`x`	genotype or haplotype object.
`show`	a character value or vector indicating which disequilibrium measures should be displayed. The default is to show all of the available measures. `show="table"` will display a table of observed, expected, and observed-expected frequencies.
`conf`	Confidence level to use when computing the confidence level for D-hat. Defaults to 0.95, should be in (0,1).
`R`	Number of bootstrap iterations to use when computing the confidence interval. Defaults to 1000.
`correct`	See details.
`na.rm`	logical. Should missing values be removed?
`...`	optional parameters passed to `boot.ci` (`diseq.ci`) or ignored.

For a single-gene marker, diseq computes the Hardy-Weinberg (dis)equilibrium statistic D, D', r (the correlation coefficient), and r^2 for each pair of allele values, as well as an overall summary value for each measure across all alleles. print.diseq displays the contents of a diseq object. diseq.ci computes a bootstrap confidence interval for this estimate.

For consistency, I have applied the standard definitions for D, D', and r from the Linkage Disequilibrium case, replacing all marker probabilities with the appropriate allele probabilities.

Thus, for each allele pair,

D is defined as the half of the raw difference in frequency between the observed number of heterozygotes and the expected number:

D = 1/2 * ( p(ij) + p(ji) ) - p(i)*p(j)
D' rescales D to span the range [-1,1]

D' = D / Dmax

where, if D > 0:

Dmax = min(p(i)p(j), p(j)p(i)) = p(i)p(j)

or if D < 0:

Dmax = min( p(i) * (1 - p(j)), p(j)( 1 - (1-p(i) ) ) )
r is the correlation coefficient between two alleles, and can be computed by

r = -D / sqrt( p(i)*(1-p(i)) * p(j)*(1-p(j)) )

where

- p(i) defined as the observed probability of allele 'i',
-p(j) defined as the observed probability of allele 'j', and
-p(ij) defined as the observed probability of the allele pair 'ij'.

When there are more than two alleles, the summary values for these statistics are obtained by computing a weighted average of the absolute value of each allele pair, where the weight is determined by the expected frequency. For example:

D.overall = sum |D(ij)| * p(ij)

Bootstrapping is used to generate confidence interval in order to avoid reliance on parametric assumptions, which will not hold for alleles with low frequencies (e.g. D' following a a Chi-square distribution).

See the function HWE.test for testing Hardy-Weinberg Equilibrium, D=0.

diseq returns an object of class diseq with components

callfunction call used to create this object
data2-way table of allele pair counts
D.hatmatrix giving the observed count, expected count, observed - expected difference, and estimate of disequilibrium for each pair of alleles as well as an overall disequilibrium value.
TODOmore slots to be documented

diseq.ci returns an object of class boot.ci

Gregory R. Warnes greg@warnes.net

genotype, HWE.test, boot, boot.ci

example.data   <- c("D/D","D/I","D/D","I/I","D/D",
                    "D/D","D/D","D/D","I/I","")
g1  <- genotype(example.data)
g1

diseq(g1)
diseq.ci(g1)
HWE.test(g1)  # does the same, plus tests D-hat=0

three.data   <- c(rep("A/A",8),
                  rep("C/A",20),
                  rep("C/T",20),
                  rep("C/C",10),
                  rep("T/T",3))

g3  <- genotype(three.data)
g3

diseq(g3)
diseq.ci(g3, ci.B=10000, ci.type="bca")

# only show observed vs expected table
print(diseq(g3),show='table')