tableBoots: tableBoots
In genomaths/usefr: Utility Functions for Statistical Analyses

tableBoots

R Documentation

tableBoots

Description

Parametric Bootstrap of n x m Contingency independence test. It is assumed that the provided contingency table summarizes the results where two or more groups are compared and the outcome is a categorical variable, i.e., disease vs. no disease, pass vs. fail, treated patient vs. control group, etc.

The goodness of fit statistic is the root-mean-square statistic (RMST) or Hellinger divergence, as proposed by Perkins et al. [1, 2]. Hellinger divergence (HD) is computed as proposed in [3].

Usage

tableBoots(
  x,
  stat = c("rmst", "hd", "chisq", "all"),
  out.stat = FALSE,
  num.permut = 100
)

Arguments

`x`	A numerical matrix corresponding to cross tabulation `n x m` table (contingency table).
`stat`	Statistic to be used in the testing: 'rmst','hdiv', or 'all'.
`out.stat`	logical(1). Whether to return the values of the statistics used: the bootstrap mean and the original value estimated.
`num.permut`	Number of permutations.

Details

For goodness-of-fit the following null hypothesis is tested H_\theta: p = p(\theta) To conduct a single simulation, we perform the following three-step procedure [1,2]:

To generate m i.i.d. draws according to the model distribution p(\theta), where \theta' is the estimate calculated from the experimental data,
To estimate the parameter \theta from the data generated in Step 1, obtaining a new estimate \thetaest.
To calculate the statistic under consideration (HD, RMST), using the data generated in Step 1 and taking the model distribution to be \thetaest, where \thetaest is the estimate calculated in Step 2 from the data generated in Step 1.

After conducting many such simulations, the confidence level for rejecting the null hypothesis is the fraction of the statistics calculated in step 3 that are less than the statistic calculated from the empirical data. The significance level \alpha is the same as a confidence level of 1-\alpha.

Value

A p-value probability

Author(s)

Robersy Sanchez (https://genomaths.com).

References

Perkins W, Tygert M, Ward R. Chi^2 and Classical Exact Tests Often Wildly Misreport Significance; the Remedy Lies in Computers [Internet]. Uploaded to ArXiv. 2011. Report No.: arXiv:1108.4126v2.
Perkins, W., Tygert, M. & Ward, R. Computing the confidence levels or a root-mean square test of goodness-of-fit. 217, 9072-9084 (2011).
Basu, A., Mandal, A. & Pardo, L. Hypothesis testing for two discrete populations based on the Hellinger distance. Stat. Probab. Lett. 80, 206-214 (2010).

Examples

set.seed(123)
## The numbers of methylated and unmethylted "read-counts" targeting DNA
## cytosine sites are the typical raw data resulting from bisulfate
## sequencing experiments (BiSeq). Methylation analysis requires for the
## application of some statistical tests to identify differentially
## methylated sites/ positions (DMSs or DMPs). BiSeq experiments on humans
## are relatively expensive (still in 2020) and some researcher is willing
## to accept a site coverage (total) of 4 read-counts as threshold for a
## site to be included in the analysis. Let's suppose that at a given site
## we have the following contingency table:

site_res <- matrix(c(3, 1, 1, 3),
    nrow = 2,
    dimnames = list(
        status = c("ctrl", "treat"),
        treat = c("meth", "unmeth")
    )
)
site_res

## The application of classical test variants yield the following results
fisher.test(site_res)$p.value
fisher.test(site_res, alternative = "greater")$p.value
fisher.test(site_res, simulate.p.value = TRUE)$p.value
chisq.test(site_res)$p.value
chisq.test(site_res, simulate.p.value = TRUE)$p.value

## That is, these test did not detect differences between the read-counts
## for control and the treated patients. 'tableBoots' function suggests
## that the site is borderline.

tableBoots(site_res, stat = "all", num.permut = 1e3)

## A slight change in the read counts identify a meaningful difference
## not detected by the classicla tests
site_res <- matrix(c(3, 0, 1, 4),
    nrow = 2,
    dimnames = list(
        status = c("ctrl", "treat"),
        treat = c("meth", "unmeth")
    )
)
site_res

set.seed(23)
tableBoots(site_res, stat = "all", num.permut = 1e3)
fisher.test(site_res)$p.value
fisher.test(site_res, alternative = "greater")$p.value
fisher.test(site_res, simulate.p.value = TRUE)$p.value
chisq.test(site_res)$p.value
chisq.test(site_res, simulate.p.value = TRUE)$p.value

## Now, let's suppose that we want to test whether methylation status from
## two DNA sites the same differentially methylated region (DMR)
## are independent, i.e. whether the treatment affected the methylation
## status of the two sites. In this case, the counts grouped into four
## categories.
set.seed(1)
site_res <- matrix(c(3, 1, 1, 3, 3, 0, 1, 4),
    nrow = 2, byrow = FALSE,
    dimnames = list(
        status = c("ctrl", "treat"),
        treat = c(
            "meth.1", "unmeth.1",
            "meth.2", "unmeth.2"
        )
    )
)
site_res

## Chi-squared from the R package 'stats'
chisq.test(site_res)$p.bvalue
chisq.test(site_res, simulate.p.value = TRUE, B = 2e3)$p.value

tableBoots(site_res, stat = "all", num.permut = 999)

## Results above are in border. If we include, third site,
## then sinces would different.
site_res <- matrix(c(3, 1, 1, 3, 3, 0, 1, 4, 4, 0, 1, 4),
    nrow = 2, byrow = FALSE,
    dimnames = list(
        status = c("ctrl", "treat"),
        treat = c(
            "meth.1", "unmeth.1",
            "meth.2", "unmeth.2",
            "meth.3", "unmeth.3"
        )
    )
)
site_res

## That is, we have not reason to believe that the observed methylation
## levels in the treatment are not independent
chisq.test(site_res)$p.value
chisq.test(site_res, simulate.p.value = TRUE, B = 2e3)$p.value
tableBoots(site_res, stat = "all", num.permut = 999)

genomaths/usefr documentation built on June 10, 2025, 9:18 p.m.