tableBoots | R Documentation |
Parametric Bootstrap of n x m
Contingency independence
test. It is assumed that the provided contingency table summarizes the
results where two or more groups are compared and the outcome is a
categorical variable, i.e., disease vs. no disease, pass vs. fail, treated
patient vs. control group, etc.
The goodness of fit statistic is the root-mean-square statistic (RMST) or Hellinger divergence, as proposed by Perkins et al. [1, 2]. Hellinger divergence (HD) is computed as proposed in [3].
tableBoots(
x,
stat = c("rmst", "hd", "chisq", "all"),
out.stat = FALSE,
num.permut = 100
)
x |
A numerical matrix corresponding to cross tabulation |
stat |
Statistic to be used in the testing: 'rmst','hdiv', or 'all'. |
out.stat |
logical(1). Whether to return the values of the statistics used: the bootstrap mean and the original value estimated. |
num.permut |
Number of permutations. |
For goodness-of-fit the following null hypothesis is tested
H_\theta: p = p(\theta)
To conduct a single simulation, we perform the following three-step
procedure [1,2]:
To generate m i.i.d. draws according to the model distribution
p(\theta)
, where \theta'
is the estimate calculated
from the experimental data,
To estimate the parameter \theta
from the data generated in
Step 1, obtaining a new estimate \theta
est.
To calculate the statistic under consideration (HD,
RMST), using the data generated in Step 1 and taking the model
distribution to be \theta
est, where \theta
est is the
estimate calculated in Step 2 from the data generated in Step 1.
After conducting many such simulations, the confidence level for
rejecting the null hypothesis is the fraction of the statistics
calculated in step 3 that are less than the statistic calculated from
the empirical data. The significance level \alpha
is the same as a
confidence level of 1-\alpha
.
A p-value probability
Robersy Sanchez (https://genomaths.com).
Perkins W, Tygert M, Ward R. Chi^2 and Classical Exact Tests Often Wildly Misreport Significance; the Remedy Lies in Computers [Internet]. Uploaded to ArXiv. 2011. Report No.: arXiv:1108.4126v2.
Perkins, W., Tygert, M. & Ward, R. Computing the confidence levels or a root-mean square test of goodness-of-fit. 217, 9072-9084 (2011).
Basu, A., Mandal, A. & Pardo, L. Hypothesis testing for two discrete populations based on the Hellinger distance. Stat. Probab. Lett. 80, 206-214 (2010).
set.seed(123)
## The numbers of methylated and unmethylted "read-counts" targeting DNA
## cytosine sites are the typical raw data resulting from bisulfate
## sequencing experiments (BiSeq). Methylation analysis requires for the
## application of some statistical tests to identify differentially
## methylated sites/ positions (DMSs or DMPs). BiSeq experiments on humans
## are relatively expensive (still in 2020) and some researcher is willing
## to accept a site coverage (total) of 4 read-counts as threshold for a
## site to be included in the analysis. Let's suppose that at a given site
## we have the following contingency table:
site_res <- matrix(c(3, 1, 1, 3),
nrow = 2,
dimnames = list(
status = c("ctrl", "treat"),
treat = c("meth", "unmeth")
)
)
site_res
## The application of classical test variants yield the following results
fisher.test(site_res)$p.value
fisher.test(site_res, alternative = "greater")$p.value
fisher.test(site_res, simulate.p.value = TRUE)$p.value
chisq.test(site_res)$p.value
chisq.test(site_res, simulate.p.value = TRUE)$p.value
## That is, these test did not detect differences between the read-counts
## for control and the treated patients. 'tableBoots' function suggests
## that the site is borderline.
tableBoots(site_res, stat = "all", num.permut = 1e3)
## A slight change in the read counts identify a meaningful difference
## not detected by the classicla tests
site_res <- matrix(c(3, 0, 1, 4),
nrow = 2,
dimnames = list(
status = c("ctrl", "treat"),
treat = c("meth", "unmeth")
)
)
site_res
set.seed(23)
tableBoots(site_res, stat = "all", num.permut = 1e3)
fisher.test(site_res)$p.value
fisher.test(site_res, alternative = "greater")$p.value
fisher.test(site_res, simulate.p.value = TRUE)$p.value
chisq.test(site_res)$p.value
chisq.test(site_res, simulate.p.value = TRUE)$p.value
## Now, let's suppose that we want to test whether methylation status from
## two DNA sites the same differentially methylated region (DMR)
## are independent, i.e. whether the treatment affected the methylation
## status of the two sites. In this case, the counts grouped into four
## categories.
set.seed(1)
site_res <- matrix(c(3, 1, 1, 3, 3, 0, 1, 4),
nrow = 2, byrow = FALSE,
dimnames = list(
status = c("ctrl", "treat"),
treat = c(
"meth.1", "unmeth.1",
"meth.2", "unmeth.2"
)
)
)
site_res
## Chi-squared from the R package 'stats'
chisq.test(site_res)$p.bvalue
chisq.test(site_res, simulate.p.value = TRUE, B = 2e3)$p.value
tableBoots(site_res, stat = "all", num.permut = 999)
## Results above are in border. If we include, third site,
## then sinces would different.
site_res <- matrix(c(3, 1, 1, 3, 3, 0, 1, 4, 4, 0, 1, 4),
nrow = 2, byrow = FALSE,
dimnames = list(
status = c("ctrl", "treat"),
treat = c(
"meth.1", "unmeth.1",
"meth.2", "unmeth.2",
"meth.3", "unmeth.3"
)
)
)
site_res
## That is, we have not reason to believe that the observed methylation
## levels in the treatment are not independent
chisq.test(site_res)$p.value
chisq.test(site_res, simulate.p.value = TRUE, B = 2e3)$p.value
tableBoots(site_res, stat = "all", num.permut = 999)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.