mvI.test: Independence Coefficient and Test

View source: R/mvI.R

mvI.testR Documentation

Independence Coefficient and Test

Description

Computes a type of multivariate nonparametric E-statistic and test of independence based on independence coefficient \mathcal I_n. This coefficient pre-dates and is different from distance covariance or distance correlation.

Usage

    mvI.test(x, y, R)
    mvI(x, y)

Arguments

x

matrix: first sample, observations in rows

y

matrix: second sample, observations in rows

R

number of replicates

Details

mvI computes the coefficient \mathcal I_n and mvI.test performs a nonparametric test of independence. The test decision is obtained via permutation bootstrap, with R replicates. The sample sizes (number of rows) of the two samples must agree, and samples must not contain missing values.

Historically this is the first energy test of independence. The distance covariance test dcov.test, distance correlation dcor, and related methods are more recent (2007, 2009).

The distance covariance test dcov.test and distance correlation test dcor.test are much faster and have different properties than mvI.test. All are based on a population independence coefficient that characterizes independence and of these tests are statistically consistent. However, dCor is scale invariant while I_n is not. In applications dcor.test or dcov.test are the recommended tests.

Computing formula from Bakirov, Rizzo, and Szekely (2006), equation (2):

Suppose the two samples are X_1,\dots,X_n \in R^p and Y_1,\dots,Y_n \in R^q. Define Z_{kl} = (X_k, Y_l) \in R^{p+q}.

The independence coefficient \mathcal I_n is defined

\mathcal I_n = \sqrt{\frac{2\bar z - z_d - z}{x + y - z}},

where

z_d= \frac{1}{n^2} \sum_{k,l=1}^n |Z_{kk}-Z_{ll}|_{p+q},

z= \frac{1}{n^4} \sum_{k,l=1}^n \sum_{i,j=1}^n |Z_{kl}-Z_{ij}|_{p+q},

\bar z= \frac{1}{n^3} \sum_{k=1}^n \sum_{i,j=1}^n |Z_{kk}-Z_{ij}|_{p+q},

x= \frac{1}{n^2} \sum_{k,l=1}^n |X_{k}-X_{l}|_p,

y= \frac{1}{n^2} \sum_{k,l=1}^n |Y_{k}-Y_{l}|_q.

Some properties:

  • 0 \leq \mathcal I_n \leq 1 (Theorem 1).

  • Large values of n \mathcal I_n^2 (or \mathcal I_n) support the alternative hypothesis that the sampled random variables are dependent.

  • \mathcal I_n is invariant to shifts and orthogonal transformations of X and Y.

  • \sqrt{n} \, \mathcal I_n determines a statistically consistent test of independence against all fixed dependent alternatives (Corollary 1).

  • The population independence coefficient \mathcal I is a normalized distance between the joint characteristic function and the product of the marginal characteristic functions. \mathcal I_n converges almost surely to \mathcal I as n \to \infty. X and Y are independent if and only if \mathcal I(X, Y) = 0. See the reference below for more details.

Value

mvI returns the statistic. mvI.test returns a list with class htest containing

method

description of test

statistic

observed value of the test statistic n\mathcal I_n^2

estimate

\mathcal I_n

replicates

permutation replicates

p.value

p-value of the test

data.name

description of data

Note

On scale invariance: Distance correlation (dcor) has the property that if we change the scale of X from e.g., meters to kilometers, and the scale of Y from e.g. grams to ounces, the statistic and the test are not changed. \mathcal I_n does not have this property; it is invariant only under a common rescaling of X and Y by the same constant. Thus, if the units of measurement change for either or both variables, dCor is invariant, but \mathcal I_n and possibly the mvI.test decision changes.

Author(s)

Maria L. Rizzo mrizzo@bgsu.edu and Gabor J. Szekely

References

Bakirov, N.K., Rizzo, M.L., and Szekely, G.J. (2006), A Multivariate Nonparametric Test of Independence, Journal of Multivariate Analysis 93/1, 58-80.

Szekely, G.J., Rizzo, M.L., and Bakirov, N.K. (2007), Measuring and Testing Dependence by Correlation of Distances, Annals of Statistics, Vol. 35 No. 6, pp. 2769-2794.

Szekely, G.J. and Rizzo, M.L. (2009), Brownian Distance Covariance, Annals of Applied Statistics, Vol. 3, No. 4, 1236-1265.

See Also

dcov.test dcov dcor.test dcor dcov2d dcor2d indep.test

Examples

mvI(iris[1:25, 1], iris[1:25, 2])

mvI.test(iris[1:25, 1], iris[1:25, 2], R=99)


energy documentation built on Sept. 11, 2024, 7:57 p.m.