fun.chisq.test: Model-Free Functional Chi-Squared and Exact Tests

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Asymptotic chi-squared, normalized chi-squared or exact tests on contingency tables to determine model-free functional dependency of the column variable on the row variable.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
fun.chisq.test(
  x,
  method = c("fchisq", "nfchisq",
             "exact", "exact.qp", "exact.dp", "exact.dqp",
             "default", "normalized", "simulate.p.value"),
  alternative = c("non-constant", "all"), log.p=FALSE,
  index.kind = c("conditional", "unconditional"),
  simulate.nruns = 2000,
  exact.mode.bound=TRUE
)

Arguments

x

a matrix representing a contingency table. The row variable represents the independent variable or all unique combinations of multiple independent variables. The column variable is the dependent variable.

method

a character string to specify the method to compute the functional chi-squared test statistic and its p-value. The options are "fchisq" (equivalent to "default", the default), "nfchisq" (equivalent to "normalized"), "exact", "exact.qp", "exact.dp", "exact.dqp" or "simulate.p.value". See Details.

Note: "default" and "normalized" are deprecated.

alternative

a character string to specify the alternative hypothesis. The options are "non-constant" (default, non-constant functions) and "all" (all types of functions including constant ones).

log.p

logical; if TRUE, the p-value is given as log(p). Taking the log improves the accuracy when p-value is close to zero. The default is FALSE.

index.kind

a character string to specify the kind of function index xi.f to be estimated. The options are "conditional" (default) and "unconditional". See Details.

simulate.nruns

A number to specify the number of tables generated to simulate the null distribution. Default is 2000. Only used when method="simulate.p.value".

exact.mode.bound

logical; if TRUE, a fast branch-and-bound algorithm is used for the exact functional test (method="exact"). If FALSE, a slow brute-force enumeration method is used to provide a reference for runtime analysis. Both options provide the same exact p-value. The default is TRUE.

Details

The functional chi-squared test determines whether the column variable is a function of the row variable in contingency table x (Zhang and Song, 2013; Zhang, 2014). This function supports three hypothesis testing methods:

index.kind specifies the kind of function index to be computed. If the experimental design controls neither the row nor column marginal sums, index.kind = "unconditional" (default) is recommended; If the column marginal sums are controlled, index.kind = "conditional" is recommended. The choice of index.kind affects only the function index xi.f value, but not the test statistic or p-value.

When method="fchisq" (equivalent to "default", the default), the test statistic is computed as described in (Zhang and Song, 2013; Zhang, 2014) and the p-value is computed using the chi-squared distribution.

When method="nfchisq" (equivalent to "normalized"), the test statistic is obtained by shifting and scaling the original test statistic (Zhang and Song, 2013; Zhang, 2014); and the p-value is computed using the standard normal distribution (Box et al., 2005). The normalized chi-squared, more conservative on the degrees of freedom, was used by the Best Performer NMSUSongLab in HPN-DREAM (DREAM8) Breast Cancer Network Inference Challenges.

When method="exact", "exact.qp" (quadratic programming), "exact.dp" (dynamic programming), or "exact.dqp" (dynamic and quadratic programming), an exact functional test is performed. The option of "exact" uses "exact.dqp", the fastest method. The methods compute an exact p-value, as described in (Zhong and Song, 2019; Nguyen, 2018).

For the "exact.qp" and "exact.dp" options, if the sample size is no more than 200 or the average cell count is less than five, and the table size is no more than 10 in either row or column, the exact test will not be called and the asymptotic functional chi-squared test (method="fchisq") is used instead.

For "exact.dqp", the exact functional test will always be performed.

For 2-by-2 contingency tables, the asymptotic test options (method="fchisq" or "nfchisq") are recommended to test functional dependency, instead of the exact functional test.

When method="simulate.p.value", a simulated null distribution is used to calculate p-value. The null distribution is a multinomial distribution that is the product of two marginal distributions. Like other Monte Carlo based methods, this method is slower but may be more accurate than other methods based on asymptotic distributions.

Value

A list with class "htest" containing the following components:

statistic

the functional chi-squared statistic if method = "fchisq", "default", or "exact"; or the normalized functional chi-squared statistic if method = "nfchisq" or "normalized".

parameter

degrees of freedom for the functional chi-squared statistic.

p.value

p-value of the functional test. If method = "fchisq" (or "default"), it is computed by an asymptotic chi-squared distribution; if method = "nfchisq" (or "normalized"), it is computed by the standard normal distribution; if method = "exact", it is computed by an exact hypergeometric distribution.

estimate

an estimate of function index between 0 and 1. The value of 1 indicates a strictly mathematical function. It is asymmetrical with respect to transpose of the input contingency table, different from the symmetrical Cramer's V based on the Pearson's chi-squared test statistic.

Author(s)

Yang Zhang, Hua Zhong and Joe Song

References

Box, G. E., Hunter, J. S. and Hunter, W. G. (2005) Statistics for Experimenters: Design, Innovation and Discovery, 2nd ed., New York: Wiley-Interscience.

Nguyen, H. H. (2018) Inference of Functional Dependency via Asymmetric, Optimal, and Model-free Statistics. Unpublished doctoral dissertation, Department of Computer Science, New Mexico State University, Las Cruces, USA.

Zhang, Y. and Song, M. (2013) Deciphering interactions in causal networks without parametric assumptions. arXiv Molecular Networks, arXiv:1311.2707, https://arxiv.org/abs/1311.2707

Zhang, Y. (2014) Nonparametric Statistical Methods for Biological Network Inference. Unpublished doctoral dissertation, Department of Computer Science, New Mexico State University, Las Cruces, USA.

Zhong, H. and Song, M. (2019) A fast exact functional test for directional association and cancer biology applications. IEEE/ACM Transactions on Computational Biology and Bioinformatics 16(3), 818–826. Retrieved from https://doi.org/10.1109/TCBB.2018.2809743

See Also

For data discretization by optimal univariate k-means clustering, see Ckmeans.1d.dp.

For symmetrical dependency tests on discrete data, see Pearson's chi-squared test chisq.test, Fisher's exact test fisher.test, and mutual information entropy.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
## Not run: 
# Example 1. Asymptotic functional chi-squared test
x <- matrix(c(20,0,20,0,20,0,5,0,5), 3)
fun.chisq.test(x) # strong functional dependency
fun.chisq.test(t(x)) # weak functional dependency

# Example 2. Normalized functional chi-squared test
x <- matrix(c(8,0,8,0,8,0,2,0,2), 3)
fun.chisq.test(x, method="nfchisq") # strong functional dependency
fun.chisq.test(t(x), method="nfchisq") # weak functional dependency

# Example 3. Exact functional chi-squared test
x <- matrix(c(4,0,4,0,4,0,1,0,1), 3)
fun.chisq.test(x, method="exact") # strong functional dependency
fun.chisq.test(t(x), method="exact") # weak functional dependency

# Example 4. Exact functional chi-squared test on a real data set
#            (Shen et al., 2002)
# x is a contingency table with row variable for p53 mutation and
#   column variable for CIMP
x <- matrix(c(12,26,18,0,8,12), nrow=2, ncol=3, byrow=TRUE)

# Test the functional dependency: p53 mutation -> CIMP
fun.chisq.test(x, method="exact")

# Test the functional dependency CIMP -> p53 mutation
fun.chisq.test(t(x), method="exact")

# Example 5. Asymptotic functional chi-squared test with simulated distribution
x <- matrix(c(20,0,20,0,20,0,5,0,5), 3)
fun.chisq.test(x, method="simulate.p.value")
fun.chisq.test(x, method="simulate.p.value", simulate.n = 1000)

## End(Not run)

FunChisq documentation built on Sept. 24, 2019, 5:04 p.m.