chi.sq.test: Vectorized Pearson's Chi-squared Test for Count Data

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/idsel.R

Description

Performs vectorized tests of independence and goodness-of-fit tests for allele count data.

Usage

1
chi.sq.test(A0, a0, At = NULL, at = NULL, p0 = 0.5, min.cov = 1, max.cov = 1, min.cnt = 1, log = FALSE)

Arguments

A0, a0, At, at

numeric vector with biallelic count data at the first (A0, a0) and second (At, at) time point. The i-th element of each vector is treated as count data of an individual locus.

p0

a numeric vector with expected relative frequencies of A0. Ignored if both At and at are given, see 'Details'.

min.cov

numeric specifying the minimal sequence coverage. If the sum of biallelic counts (A, a) are below min.cov in any of the replicates at any of the two time points (0, t), then chi.sq.test will return NA for the respective loci.

max.cov

numeric specifying the maximal sequence coverage. Values between 0 and 1 are interpreted as quantile thresholds, see 'Details'. If the sum of biallelic counts (A, a) exceeds the threshold in any of the replicates at any of the two time points (0, t), then chi.sq.test will return NA for the respective loci. By default (max.cov = 1) the chi-squared test will be applied to all loci.

min.cnt

numeric indicating the minimal minor allele count. If the sum of minor allele counts over all replicates and time points is below min.cnt, then chi.sq.test will return NA for the respective loci.

log

logical determining whether p-values should be returned directly (default), or after -log10 transformation (if log = TRUE).

Details

If either At or at is missing, then a goodness-of-fit test is performed with the nullhypothesis that relative frequencies of A-alleles are equal to those specified in p0. If both At and at are given, then Pearson's chi-squared test of independence is performed with the nullhypothesis that relative allele frequencies are not associated with time point.

max.cov can be specified in two distinct ways. Values larger than 1 are interpreted as absolute sequence coverage thresholds, while values between 0 and 1 are interpreted as quantile thresholds. If for example max.cov = 0.95 then chi.sq.test will return NA for all loci, where the sequence coverage exceeds the 95% quantile in any of the replicates or time points. Please note that the quantile thresholds are computed separately for each replicate and time point.

Value

chi.sq.test returns a numeric vector of p-values.

Author(s)

Thomas Taus

References

Sokal R. S. and Rohlf F. J.: Biometry (third edition). New York: W. H. Freeman and Company 1995

See Also

cmh.test

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# simulate allele counts - 2 time points, 10 individual loci
coverage <- 20
loci <- 10
p0 <- 0.1
pt <- 0.6
A0 <- rbinom(loci, size=coverage, prob=p0)
a0 <- coverage - A0
At <- rbinom(loci, size=coverage, prob=pt)
at <- coverage - At
# perform Pearson's chi-squared test of independence
chi.sq.test(A0=A0, a0=a0, At=At, at=at)

# simulate allele counts - 1 time point, 10 individual loci
coverage <- 20
loci <- 10
p0 <- 0.2
A0 <- rbinom(loci, size=coverage, prob=p0)
a0 <- coverage - A0
# perform goodness-of-fit test
chi.sq.test(A0=A0, a0=a0, p0=0.5)

parOld <- par(mfrow=c(3,1))
for(r in 1:3) {
  # get allele counts from empirical data
  coverage0 <- t(coverage(dmelER, repl=r, gen=0))
  A0 <- t(af(dmelER, repl=r, gen=0)) * coverage0
  a0 <- coverage0 - A0
  coverage59 <- t(coverage(dmelER, repl=r, gen=59))
  A59 <- t(af(dmelER, repl=r, gen=59)) * coverage59
  a59 <- coverage59 - A59
  # apply Pearson's chi-squared test of independence to each replicate separately
  p.values <- chi.sq.test(A0=A0, a0=a0, At=A59, at=a59, min.cov=20, max.cov=0.99, min.cnt=1, log=TRUE)
  # plot -log10 transformed p-values
  plot(p.values, main=paste0("Chi-squared test - Replicate ", r), ylim=c(0, max(p.values, na.rm=TRUE)), xlab="SNP", ylab="-log10(p)", pch=".")
}
par(parOld)

ThomasTaus/poolSeq documentation built on Feb. 17, 2020, 1:52 p.m.