# hoeffding.refined.test: The refined Hoeffding independence test In independence: Fast Rank-Based Independence Testing

## Description

The function `hoeffding.refined.test` provides independence testing for two continuous numeric variables, that is consistent for all alternatives. The test statistic is a variant of the classical Hoeffding's D statistic. In terms of CDFs, it estimates the integral of (Fxy-Fx*Fy)^2 dFx dFy, based on the ordering types of quintuples of data points. This test statistic is efficiently computed via a new O(n log n)-time algorithm, following work of Even-Zohar and Leng.

## Usage

 ```1 2 3 4 5 6 7``` ```hoeffding.refined.test( xs, ys, na.rm = TRUE, collisions = TRUE, precision = 1e-05 ) ```

## Arguments

 `xs, ys` Same-length numeric vectors, containing paired samples. `na.rm` Logical: Should missing values, `NaN`, and `Inf` be removed? `collisions` Logical: Warn of repeating values in `xs` or `ys`. `precision` of p-value, between 0 and 1. Otherwise p-value=`NA`.

## Value

A list, of class `"indtest"`:

 `method` the test's name `n` number of data points used `Tn`/`Dn`/`Rn`/... the test statistic, measure of dependence `scaled` the test statistic rescaled for a standard null distribution `p.value` the asymptotic p-value, by TauStar::`pHoeffInd`

## P-value

The null distribution of the test statistic was described by Hoeffding. The p-value is approximated by calling the function `pHoeffInd` from the package `TauStar` by Luca Weihs.

By default, the p-value's `precision` parameter is set to `1e-5`. It seems that better precision would cost a considerable amount of time, especially for large values of the test statistic. It is therefore recommended to modify this parameter only upon need.

In case that `TauStar` is unavailable, or to save time in repeated use, set `precision = 1` to avoid computing p-values altogether. The `scaled` test statistic may be used instead. Its asymptotic distribution does not depend on any parameter. Also the raw test statistic may be used, descriptively, as a measure of dependence. Only its accuracy depends on the sample size.

## Ties

This package currently assumes that the variables under consideration are non-atomic, so that ties are not expected, other than by occasional effects of numerical precision. Addressing ties rigorously is left for future versions.

The flag `collisions = TRUE` invokes checking for ties in `xs` and in `ys`, and produces an appropriate warning if they exist. The current implementation breaks such ties arbitrarily, not randomly.

By the averaging nature of the test statistic, it seems that a handful of ties should not be of much concern. In case of more than a handful of ties, our current advice to the user is to break them uniformly at random beforehand.

## Big Data

The test statistic is computed in almost linear time, O(n log n), given a sample of size n. Its computation involves integer arithmetics of order n^4 or n^5, which should fit into an integer data type supported by the compiler.

Most 64-bit compilers emulate 128-bit arithmetics. Otherwise we use the standard 64-bit arithmetics. Find the upper limits of your environment using

1. `max_taustar()`

2. `max_hoeffding()`

Another limitation is 2^31-1, the maximum size and value of an integer vector in a 32-bit build of R. This is only relevant for the tau star statistic in 128-bit mode, which could otherwise afford about three times that size. If your sample size falls in this range, try recompiling the function `.calc.taustar` according to the instructions in the cpp source file.

## References

Hoeffding, Wassily. "A non-parametric test of independence." The annals of mathematical statistics (1948): 546-557.

Yanagimoto, Takemi. "On measures of association and a related problem." Annals of the Institute of Statistical Mathematics 22.1 (1970): 57-63.

Luca Weihs (2019). TauStar: Efficient Computation and Testing of the Bergsma-Dassios Sign Covariance. R package version 1.1.4. https://CRAN.R-project.org/package=TauStar

Even-Zohar, Chaim. "Patterns in Random Permutations." arXiv preprint arXiv:1811.07883 (2018).

Even-Zohar, Chaim, and Calvin Leng. "Counting Small Permutation Patterns." arXiv preprint arXiv:1911.01414 (2019).

Even-Zohar, Chaim. "independence: Fast Rank Tests." arXiv preprint arXiv:2010.09712 (2020).

`independence`, `tau.star.test`, `hoeffding.D.test`,
 ```1 2 3 4 5 6 7``` ```## independent, \$p.value is 0.258636 set.seed(123) hoeffding.refined.test(rnorm(10000),rnorm(10000)) ## dependent, even though uncorrelated, \$p.value is 0.0003017679 set.seed(123) hoeffding.refined.test(rnorm(10000,0,3001:13000), rnorm(10000,0,3001:13000)) ```