cmh.test: Vectorized Cochran-Mantel-Haenszel Test for Count Data

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Performs a vectorized version of the Cochran-Mantel-Haenszel (CMH) test with the nullhypothesis that relative allele frequencies are not associated with time point.

Usage

1
cmh.test(A0, a0, At, at, min.cov = 1, max.cov = 1, min.cnt = 1, log = FALSE)

Arguments

A0, a0, At, at

numeric matrix with biallelic count data at the first (A0, a0) and second (At, at) time point. Rows and columns correspond to replicates and individual loci, respectively.

min.cov

numeric specifying the minimal sequence coverage. If the sum of biallelic counts (A, a) are below min.cov in any of the replicates at any of the two time points (0, t), then cmh.test will return NA for the respective loci.

max.cov

numeric specifying the maximal sequence coverage. Values between 0 and 1 are interpreted as quantile thresholds, see 'Details'. If the sum of biallelic counts (A, a) exceeds the threshold in any of the replicates at any of the two time points (0, t), then cmh.test will return NA for the respective loci. By default (max.cov = 1) the CMH-test will be applied to all loci.

min.cnt

numeric indicating the minimal minor allele count. If the sum of minor allele counts over all replicates and time points is below min.cnt, then cmh.test will return NA for the respective loci.

log

logical determining whether p-values should be returned directly (default), or after -log10 transformation (if log = TRUE).

Details

max.cov can be specified in two distinct ways. Values larger than 1 are interpreted as absolute sequence coverage thresholds, while values between 0 and 1 are interpreted as quantile thresholds. If for example max.cov = 0.95 then cmh.test will return NA for all loci, where the sequence coverage exceeds the 95% quantile in any of the replicates or time points. Please note that the quantile thresholds are computed separately for each replicate and time point.

Value

cmh.test returns a numeric vector of p-values.

Author(s)

Thomas Taus

References

Agresti A.: Categorical data analysis (second edition). New York: Wiley 2002

See Also

chi.sq.test

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# simulate data - 1 locus, 3 replicates
coverage <- 20
repl <- 3
p0 <- 0.1
pt <- 0.6
A0 <- rbinom(repl, size=coverage, prob=p0)
a0 <- coverage - A0
At <- rbinom(repl, size=coverage, prob=pt)
at <- coverage - At
# perform CMH-test
cmh.test(A0=A0, a0=a0, At=At, at=at)

# simulate data - 10 loci, 3 replicates
coverage <- 20
repl <- 3
loci <- 10
p0 <- 0.1
pt <- 0.6
A0 <- foreach(l=1:loci, .combine=cbind, .final=unname) %do% { rbinom(repl, size=coverage, prob=p0) }
a0 <- coverage - A0
At <- foreach(l=1:loci, .combine=cbind, .final=unname) %do% { rbinom(repl, size=coverage, prob=pt) }
at <- coverage - At
# show the structure of the simulated cound data
A0
a0
At
at
# perform CMH-test for each locus individually
cmh.test(A0=A0, a0=a0, At=At, at=at)

# get allele counts from empirical data
coverage0 <- t(coverage(dmelER, repl=1:3, gen=0))
A0 <- t(af(dmelER, repl=1:3, gen=0)) * coverage0
a0 <- coverage0 - A0
coverage59 <- t(coverage(dmelER, repl=1:3, gen=59))
A59 <- t(af(dmelER, repl=1:3, gen=59)) * coverage59
a59 <- coverage59 - A59
# perform CMH-test for all empirical loci
p.values <- cmh.test(A0=A0, a0=a0, At=A59, at=a59, min.cov=20, max.cov=0.99, min.cnt=1, log=TRUE)
# plot -log10 transformed p-values
plot(p.values, main="CMH Manhattan plot", xlab="SNP", ylab="-log10(p)", pch=".")

ThomasTaus/poolSeq documentation built on May 23, 2019, 8:36 a.m.