cmh.test: Vectorized Cochran-Mantel-Haenszel Test for Count Data
In ThomasTaus/poolSeq: Simulate and Analyze Pool-seq Data

Description Usage Arguments Details Value Author(s) References See Also Examples

Performs a vectorized version of the Cochran-Mantel-Haenszel (CMH) test with the nullhypothesis that relative allele frequencies are not associated with time point.

1	cmh.test(A0, a0, At, at, min.cov = 1, max.cov = 1, min.cnt = 1, log = FALSE)

`A0, a0, At, at`	numeric matrix with biallelic count data at the first (`A0`, `a0`) and second (`At`, `at`) time point. Rows and columns correspond to replicates and individual loci, respectively.
`min.cov`	numeric specifying the minimal sequence coverage. If the sum of biallelic counts (`A`, `a`) are below `min.cov` in any of the replicates at any of the two time points (`0`, `t`), then `cmh.test` will return `NA` for the respective loci.
`max.cov`	numeric specifying the maximal sequence coverage. Values between `0` and `1` are interpreted as quantile thresholds, see 'Details'. If the sum of biallelic counts (`A`, `a`) exceeds the threshold in any of the replicates at any of the two time points (`0`, `t`), then `cmh.test` will return `NA` for the respective loci. By default (`max.cov = 1`) the CMH-test will be applied to all loci.
`min.cnt`	numeric indicating the minimal minor allele count. If the sum of minor allele counts over all replicates and time points is below `min.cnt`, then `cmh.test` will return `NA` for the respective loci.
`log`	logical determining whether p-values should be returned directly (default), or after `-log10` transformation (if `log = TRUE`).

max.cov can be specified in two distinct ways. Values larger than 1 are interpreted as absolute sequence coverage thresholds, while values between 0 and 1 are interpreted as quantile thresholds. If for example max.cov = 0.95 then cmh.test will return NA for all loci, where the sequence coverage exceeds the 95% quantile in any of the replicates or time points. Please note that the quantile thresholds are computed separately for each replicate and time point.

cmh.test returns a numeric vector of p-values.

Thomas Taus

Agresti A.: Categorical data analysis (second edition). New York: Wiley 2002

chi.sq.test

# simulate data - 1 locus, 3 replicates
coverage <- 20
repl <- 3
p0 <- 0.1
pt <- 0.6
A0 <- rbinom(repl, size=coverage, prob=p0)
a0 <- coverage - A0
At <- rbinom(repl, size=coverage, prob=pt)
at <- coverage - At
# perform CMH-test
cmh.test(A0=A0, a0=a0, At=At, at=at)

# simulate data - 10 loci, 3 replicates
coverage <- 20
repl <- 3
loci <- 10
p0 <- 0.1
pt <- 0.6
A0 <- foreach(l=1:loci, .combine=cbind, .final=unname) %do% { rbinom(repl, size=coverage, prob=p0) }
a0 <- coverage - A0
At <- foreach(l=1:loci, .combine=cbind, .final=unname) %do% { rbinom(repl, size=coverage, prob=pt) }
at <- coverage - At
# show the structure of the simulated cound data
A0
a0
At
at
# perform CMH-test for each locus individually
cmh.test(A0=A0, a0=a0, At=At, at=at)

# get allele counts from empirical data
coverage0 <- t(coverage(dmelER, repl=1:3, gen=0))
A0 <- t(af(dmelER, repl=1:3, gen=0)) * coverage0
a0 <- coverage0 - A0
coverage59 <- t(coverage(dmelER, repl=1:3, gen=59))
A59 <- t(af(dmelER, repl=1:3, gen=59)) * coverage59
a59 <- coverage59 - A59
# perform CMH-test for all empirical loci
p.values <- cmh.test(A0=A0, a0=a0, At=A59, at=a59, min.cov=20, max.cov=0.99, min.cnt=1, log=TRUE)
# plot -log10 transformed p-values
plot(p.values, main="CMH Manhattan plot", xlab="SNP", ylab="-log10(p)", pch=".")