tableBoots | R Documentation |
Parametric Bootstrap of n x m Contingency independence test. It is assumed that the provided contingency table summarizes the results where two or more groups are compared and the outcome is a categorical variable, i.e., disease vs. no disease, pass vs. fail, treated patient vs. control group, etc.
The goodness of fit statistic is the root-mean-square statistic (RMST) or Hellinger divergence, as proposed by Perkins et al. [1, 2]. Hellinger divergence (HD) is computed as proposed in [3].
tableBoots( x, stat = c("rmst", "hd", "chisq", "all"), out.stat = FALSE, num.permut = 100 )
x |
A numerical matrix corresponding to cross tabulation n x m table (contingency table). |
stat |
Statistic to be used in the testing: 'rmst','hdiv', or 'all'. |
out.stat |
logical(1). Whether to return the values of the statistics used: the bootstrap mean and the original value estimated. |
num.permut |
Number of permutations. |
For goodness-of-fit the following null hypothesis is tested H_θ: p = p(θ) To conduct a single simulation, we perform the following three-step procedure [1,2]:
To generate m i.i.d. draws according to the model distribution p(θ), where θ' is the estimate calculated from the experimental data,
To estimate the parameter θ from the data generated in Step 1, obtaining a new estimate θest.
To calculate the statistic under consideration (HD, RMST), using the data generated in Step 1 and taking the model distribution to be θest, where θest is the estimate calculated in Step 2 from the data generated in Step 1.
After conducting many such simulations, the confidence level for rejecting the null hypothesis is the fraction of the statistics calculated in step 3 that are less than the statistic calculated from the empirical data. The significance level α is the same as a confidence level of 1-α.
A p-value probability
Robersy Sanchez (https://genomaths.com).
Perkins W, Tygert M, Ward R. Chi^2 and Classical Exact Tests Often Wildly Misreport Significance; the Remedy Lies in Computers [Internet]. Uploaded to ArXiv. 2011. Report No.: arXiv:1108.4126v2.
Perkins, W., Tygert, M. & Ward, R. Computing the confidence levels or a root-mean square test of goodness-of-fit. 217, 9072-9084 (2011).
Basu, A., Mandal, A. & Pardo, L. Hypothesis testing for two discrete populations based on the Hellinger distance. Stat. Probab. Lett. 80, 206-214 (2010).
set.seed(123) ## The numbers of methylated and unmethylted "read-counts" targeting DNA ## cytosine sites are the typical raw data resulting from bisulfate ## sequencing experiments (BiSeq). Methylation analysis requires for the ## application of some statistical tests to identify differentially ## methylated sites/ positions (DMSs or DMPs). BiSeq experiments on humans ## are relatively expensive (still in 2020) and some researcher is willing ## to accept a site coverage (total) of 4 read-counts as threshold for a ## site to be included in the analysis. Let's suppose that at a given site ## we have the following contingency table: site_res <- matrix(c(3, 1, 1, 3), nrow = 2, dimnames = list( status = c("ctrl", "treat"), treat = c("meth", "unmeth") ) ) site_res ## The application of classical test variants yield the following results fisher.test(site_res)$p.value fisher.test(site_res, alternative = "greater")$p.value fisher.test(site_res, simulate.p.value = TRUE)$p.value chisq.test(site_res)$p.value chisq.test(site_res, simulate.p.value = TRUE)$p.value ## That is, these test did not detect differences between the read-counts ## for control and the treated patients. 'tableBoots' function suggests ## that the site is borderline. tableBoots(site_res, stat = "all", num.permut = 1e3) ## A slight change in the read counts identify a meaningful difference ## not detected by the classicla tests site_res <- matrix(c(3, 0, 1, 4), nrow = 2, dimnames = list( status = c("ctrl", "treat"), treat = c("meth", "unmeth") ) ) site_res set.seed(23) tableBoots(site_res, stat = "all", num.permut = 1e3) fisher.test(site_res)$p.value fisher.test(site_res, alternative = "greater")$p.value fisher.test(site_res, simulate.p.value = TRUE)$p.value chisq.test(site_res)$p.value chisq.test(site_res, simulate.p.value = TRUE)$p.value ## Now, let's suppose that we want to test whether methylation status from ## two DNA sites the same differentially methylated region (DMR) ## are independent, i.e. whether the treatment affected the methylation ## status of the two sites. In this case, the counts grouped into four ## categories. set.seed(1) site_res <- matrix(c(3, 1, 1, 3, 3, 0, 1, 4), nrow = 2, byrow = FALSE, dimnames = list( status = c("ctrl", "treat"), treat = c( "meth.1", "unmeth.1", "meth.2", "unmeth.2" ) ) ) site_res ## Chi-squared from the R package 'stats' chisq.test(site_res)$p.bvalue chisq.test(site_res, simulate.p.value = TRUE, B = 2e3)$p.value tableBoots(site_res, stat = "all", num.permut = 999) ## Results above are in border. If we include, third site, ## then sinces would different. site_res <- matrix(c(3, 1, 1, 3, 3, 0, 1, 4, 4, 0, 1, 4), nrow = 2, byrow = FALSE, dimnames = list( status = c("ctrl", "treat"), treat = c( "meth.1", "unmeth.1", "meth.2", "unmeth.2", "meth.3", "unmeth.3" ) ) ) site_res ## That is, we have not reason to believe that the observed methylation ## levels in the treatment are not independent chisq.test(site_res)$p.value chisq.test(site_res, simulate.p.value = TRUE, B = 2e3)$p.value tableBoots(site_res, stat = "all", num.permut = 999)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.