dot-generic.rank.test: Generic rank test for paired samples

Description Usage Arguments Details Value P-value Ties Big Data See Also

Description

An internal function unifying several nonparametric tests for paired samples.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
.generic.rank.test(
  xs,
  ys,
  test,
  letter,
  description,
  na.rm = TRUE,
  collisions = TRUE,
  precision = 1e-05,
  limit_law_coef = 1,
  min_samples = 1,
  max_samples = Inf
)

Arguments

xs, ys

Same-length numeric vectors, containing paired samples.

test

Function computing the test statistic given a relative order.

letter

Notation for the test statistic, e.g., "D" for Hoeffding's D.

description

Full name of test.

na.rm

Logical: Should missing values, NaN, and Inf be removed?

collisions

Logical: Warn of repeating values in xs or ys.

precision

of p-value, between 0 and 1. Otherwise p-value=NA.

limit_law_coef

Scaling of test statistic for standard null distribution.

min_samples, max_samples

Data size limits.

Details

The function .generic.rank.test first calls relative.ordering with xs and ys. Then it uses the given function to compute the test statistic from the resulting permutation. The statistic is rescaled by multiplication with (n-1)*limit_law_coef, where n is the sample size. Finally, it computes the p-value by calling pHoeffInd from the package TauStar.

Value

A list, of class "indtest":

method the test's name
n number of data points used
Tn/Dn/Rn/... the test statistic, measure of dependence
scaled the test statistic rescaled for a standard null distribution
p.value the asymptotic p-value, by TauStar::pHoeffInd

P-value

The null distribution of the test statistic was described by Hoeffding. The p-value is approximated by calling the function pHoeffInd from the package TauStar by Luca Weihs.

By default, the p-value's precision parameter is set to 1e-5. It seems that better precision would cost a considerable amount of time, especially for large values of the test statistic. It is therefore recommended to modify this parameter only upon need.

In case that TauStar is unavailable, or to save time in repeated use, set precision = 1 to avoid computing p-values altogether. The scaled test statistic may be used instead. Its asymptotic distribution does not depend on any parameter. Also the raw test statistic may be used, descriptively, as a measure of dependence. Only its accuracy depends on the sample size.

Ties

This package currently assumes that the variables under consideration are non-atomic, so that ties are not expected, other than by occasional effects of numerical precision. Addressing ties rigorously is left for future versions.

The flag collisions = TRUE invokes checking for ties in xs and in ys, and produces an appropriate warning if they exist. The current implementation breaks such ties arbitrarily, not randomly.

By the averaging nature of the test statistic, it seems that a handful of ties should not be of much concern. In case of more than a handful of ties, our current advice to the user is to break them uniformly at random beforehand.

Big Data

The test statistic is computed in almost linear time, O(n log n), given a sample of size n. Its computation involves integer arithmetics of order n^4 or n^5, which should fit into an integer data type supported by the compiler.

Most 64-bit compilers emulate 128-bit arithmetics. Otherwise we use the standard 64-bit arithmetics. Find the upper limits of your environment using

  1. max_taustar()

  2. max_hoeffding()

Another limitation is 2^31-1, the maximum size and value of an integer vector in a 32-bit build of R. This is only relevant for the tau star statistic in 128-bit mode, which could otherwise afford about three times that size. If your sample size falls in this range, try recompiling the function .calc.taustar according to the instructions in the cpp source file.

See Also

independence, relative.order, tau.star.test, hoeffding.D.test, hoeffding.refined.test


independence documentation built on Jan. 14, 2021, 5:20 a.m.