tableBoots: tableBoots

View source: R/tableBoots.R

tableBootsR Documentation

tableBoots

Description

Parametric Bootstrap of n x m Contingency independence test. It is assumed that the provided contingency table summarizes the results where two or more groups are compared and the outcome is a categorical variable, i.e., disease vs. no disease, pass vs. fail, treated patient vs. control group, etc.

The goodness of fit statistic is the root-mean-square statistic (RMST) or Hellinger divergence, as proposed by Perkins et al. [1, 2]. Hellinger divergence (HD) is computed as proposed in [3].

Usage

tableBoots(
  x,
  stat = c("rmst", "hd", "chisq", "all"),
  out.stat = FALSE,
  num.permut = 100
)

Arguments

x

A numerical matrix corresponding to cross tabulation n x m table (contingency table).

stat

Statistic to be used in the testing: 'rmst','hdiv', or 'all'.

out.stat

logical(1). Whether to return the values of the statistics used: the bootstrap mean and the original value estimated.

num.permut

Number of permutations.

Details

For goodness-of-fit the following null hypothesis is tested H_\theta: p = p(\theta) To conduct a single simulation, we perform the following three-step procedure [1,2]:

  1. To generate m i.i.d. draws according to the model distribution p(\theta), where \theta' is the estimate calculated from the experimental data,

  2. To estimate the parameter \theta from the data generated in Step 1, obtaining a new estimate \thetaest.

  3. To calculate the statistic under consideration (HD, RMST), using the data generated in Step 1 and taking the model distribution to be \thetaest, where \thetaest is the estimate calculated in Step 2 from the data generated in Step 1.

After conducting many such simulations, the confidence level for rejecting the null hypothesis is the fraction of the statistics calculated in step 3 that are less than the statistic calculated from the empirical data. The significance level \alpha is the same as a confidence level of 1-\alpha.

Value

A p-value probability

Author(s)

Robersy Sanchez (https://genomaths.com).

References

  1. Perkins W, Tygert M, Ward R. Chi^2 and Classical Exact Tests Often Wildly Misreport Significance; the Remedy Lies in Computers [Internet]. Uploaded to ArXiv. 2011. Report No.: arXiv:1108.4126v2.

  2. Perkins, W., Tygert, M. & Ward, R. Computing the confidence levels or a root-mean square test of goodness-of-fit. 217, 9072-9084 (2011).

  3. Basu, A., Mandal, A. & Pardo, L. Hypothesis testing for two discrete populations based on the Hellinger distance. Stat. Probab. Lett. 80, 206-214 (2010).

Examples

set.seed(123)
## The numbers of methylated and unmethylted "read-counts" targeting DNA
## cytosine sites are the typical raw data resulting from bisulfate
## sequencing experiments (BiSeq). Methylation analysis requires for the
## application of some statistical tests to identify differentially
## methylated sites/ positions (DMSs or DMPs). BiSeq experiments on humans
## are relatively expensive (still in 2020) and some researcher is willing
## to accept a site coverage (total) of 4 read-counts as threshold for a
## site to be included in the analysis. Let's suppose that at a given site
## we have the following contingency table:

site_res <- matrix(c(3, 1, 1, 3),
    nrow = 2,
    dimnames = list(
        status = c("ctrl", "treat"),
        treat = c("meth", "unmeth")
    )
)
site_res

## The application of classical test variants yield the following results
fisher.test(site_res)$p.value
fisher.test(site_res, alternative = "greater")$p.value
fisher.test(site_res, simulate.p.value = TRUE)$p.value
chisq.test(site_res)$p.value
chisq.test(site_res, simulate.p.value = TRUE)$p.value

## That is, these test did not detect differences between the read-counts
## for control and the treated patients. 'tableBoots' function suggests
## that the site is borderline.

tableBoots(site_res, stat = "all", num.permut = 1e3)

## A slight change in the read counts identify a meaningful difference
## not detected by the classicla tests
site_res <- matrix(c(3, 0, 1, 4),
    nrow = 2,
    dimnames = list(
        status = c("ctrl", "treat"),
        treat = c("meth", "unmeth")
    )
)
site_res

set.seed(23)
tableBoots(site_res, stat = "all", num.permut = 1e3)
fisher.test(site_res)$p.value
fisher.test(site_res, alternative = "greater")$p.value
fisher.test(site_res, simulate.p.value = TRUE)$p.value
chisq.test(site_res)$p.value
chisq.test(site_res, simulate.p.value = TRUE)$p.value

## Now, let's suppose that we want to test whether methylation status from
## two DNA sites the same differentially methylated region (DMR)
## are independent, i.e. whether the treatment affected the methylation
## status of the two sites. In this case, the counts grouped into four
## categories.
set.seed(1)
site_res <- matrix(c(3, 1, 1, 3, 3, 0, 1, 4),
    nrow = 2, byrow = FALSE,
    dimnames = list(
        status = c("ctrl", "treat"),
        treat = c(
            "meth.1", "unmeth.1",
            "meth.2", "unmeth.2"
        )
    )
)
site_res

## Chi-squared from the R package 'stats'
chisq.test(site_res)$p.bvalue
chisq.test(site_res, simulate.p.value = TRUE, B = 2e3)$p.value

tableBoots(site_res, stat = "all", num.permut = 999)

## Results above are in border. If we include, third site,
## then sinces would different.
site_res <- matrix(c(3, 1, 1, 3, 3, 0, 1, 4, 4, 0, 1, 4),
    nrow = 2, byrow = FALSE,
    dimnames = list(
        status = c("ctrl", "treat"),
        treat = c(
            "meth.1", "unmeth.1",
            "meth.2", "unmeth.2",
            "meth.3", "unmeth.3"
        )
    )
)
site_res

## That is, we have not reason to believe that the observed methylation
## levels in the treatment are not independent
chisq.test(site_res)$p.value
chisq.test(site_res, simulate.p.value = TRUE, B = 2e3)$p.value
tableBoots(site_res, stat = "all", num.permut = 999)


genomaths/usefr documentation built on April 18, 2023, 3:35 a.m.