# dot-generic.rank.test: Generic rank test for paired samples In independence: Fast Rank-Based Independence Testing

## Description

An internal function unifying several nonparametric tests for paired samples.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13``` ```.generic.rank.test( xs, ys, test, letter, description, na.rm = TRUE, collisions = TRUE, precision = 1e-05, limit_law_coef = 1, min_samples = 1, max_samples = Inf ) ```

## Arguments

 `xs, ys` Same-length numeric vectors, containing paired samples. `test` Function computing the test statistic given a relative order. `letter` Notation for the test statistic, e.g., "D" for Hoeffding's D. `description` Full name of test. `na.rm` Logical: Should missing values, `NaN`, and `Inf` be removed? `collisions` Logical: Warn of repeating values in `xs` or `ys`. `precision` of p-value, between 0 and 1. Otherwise p-value=`NA`. `limit_law_coef` Scaling of test statistic for standard null distribution. `min_samples, max_samples` Data size limits.

## Details

The function `.generic.rank.test` first calls `relative.ordering` with `xs` and `ys`. Then it uses the given function to compute the test statistic from the resulting permutation. The statistic is rescaled by multiplication with `(n-1)*limit_law_coef`, where `n` is the sample size. Finally, it computes the p-value by calling `pHoeffInd` from the package `TauStar`.

## Value

A list, of class `"indtest"`:

 `method` the test's name `n` number of data points used `Tn`/`Dn`/`Rn`/... the test statistic, measure of dependence `scaled` the test statistic rescaled for a standard null distribution `p.value` the asymptotic p-value, by TauStar::`pHoeffInd`

## P-value

The null distribution of the test statistic was described by Hoeffding. The p-value is approximated by calling the function `pHoeffInd` from the package `TauStar` by Luca Weihs.

By default, the p-value's `precision` parameter is set to `1e-5`. It seems that better precision would cost a considerable amount of time, especially for large values of the test statistic. It is therefore recommended to modify this parameter only upon need.

In case that `TauStar` is unavailable, or to save time in repeated use, set `precision = 1` to avoid computing p-values altogether. The `scaled` test statistic may be used instead. Its asymptotic distribution does not depend on any parameter. Also the raw test statistic may be used, descriptively, as a measure of dependence. Only its accuracy depends on the sample size.

## Ties

This package currently assumes that the variables under consideration are non-atomic, so that ties are not expected, other than by occasional effects of numerical precision. Addressing ties rigorously is left for future versions.

The flag `collisions = TRUE` invokes checking for ties in `xs` and in `ys`, and produces an appropriate warning if they exist. The current implementation breaks such ties arbitrarily, not randomly.

By the averaging nature of the test statistic, it seems that a handful of ties should not be of much concern. In case of more than a handful of ties, our current advice to the user is to break them uniformly at random beforehand.

## Big Data

The test statistic is computed in almost linear time, O(n log n), given a sample of size n. Its computation involves integer arithmetics of order n^4 or n^5, which should fit into an integer data type supported by the compiler.

Most 64-bit compilers emulate 128-bit arithmetics. Otherwise we use the standard 64-bit arithmetics. Find the upper limits of your environment using

1. `max_taustar()`

2. `max_hoeffding()`

Another limitation is 2^31-1, the maximum size and value of an integer vector in a 32-bit build of R. This is only relevant for the tau star statistic in 128-bit mode, which could otherwise afford about three times that size. If your sample size falls in this range, try recompiling the function `.calc.taustar` according to the instructions in the cpp source file.

`independence`, `relative.order`, `tau.star.test`, `hoeffding.D.test`, `hoeffding.refined.test`