Performs distribution-free tests for independence of two univariate random variables.

1 2 3 4 5 6 | ```
hhg.univariate.ind.combined.test(X,Y=NULL,NullTable=NULL,mmin=2,
mmax=max(floor(sqrt(length(X))/2),2),variant='ADP',aggregation.type='sum',
score.type='LikelihoodRatio', w.sum = 0, w.max = 2 ,combining.type='MinP',
nr.perm=100,nr.atoms = nr_bins_equipartition(length(X)),
compress=F,compress.p0=0.001,compress.p=0.99,compress.p1=0.000001,
keep.simulation.data=T)
``` |

`X` |
a numeric vector with observed |

`Y` |
a numeric vector with observed |

`NullTable` |
The null table of the statistic, which can be downloaded from the software website or computed by the function |

`mmin` |
The minimum partition size of the ranked observations, default value is 2. Ignored if |

`mmax` |
The maximum partition size of the ranked observations, default value is half the square root of the number of observations. For a max aggregation.type, this parameter cannot be more than 2 for the ADP variant and 4 for DDP variant. Ignored if |

`variant` |
a character string specifying the partition type, must be one of |

`aggregation.type` |
a character string specifying the aggregation type, must be one of |

`score.type` |
a character string specifying the score type, must be one of |

`w.sum` |
The minimum number of observations in a partition, only relevant for |

`w.max` |
The minimum number of observations in a partition, only relevant for |

`combining.type` |
a character string specifying the combining type, must be one of |

`nr.perm` |
The number of permutations for the null distribution. Ignored if |

`nr.atoms` |
For |

`compress` |
TRUE or FALSE. If enabled, null tables are compressed: The lower |

`compress.p0` |
Parameter for compression. This is the resolution for the lower |

`compress.p` |
Parameter for compression. Part of the null distribution to compress. |

`compress.p1` |
Parameter for compression. This is the resolution for the upper value of the null distribution. |

`keep.simulation.data` |
TRUE/FALSE. If TRUE, then in addition to the sorted statistics per column, the original matrix of size nr.replicates by mmax-mmin+1 is also stored. |

The test statistic and p-value of the recommended independence test between two univariate random variables in Heller et al. (2014). The default combining type in the minimum p-value, so the test statistic is the minimum p-value over the range of partition sizes m from `mmin`

to `mmax`

, where the p-value for a fixed partition size m is defined by the aggregation type and score type. The combination is done over the statistics computed by `hhg.univariate.ind.stat`

. The second type of combination method for statistics, is via a Fisher type statistic, *-Σ log(p_m)* (with the sum going from *mmin* to *mmax*). The returned result may include the test statistic for the `MinP`

combination, the `Fisher`

combination, or both (see `comb.type`

).

If the argument `NullTable`

is supplied with a proper null table (constructed using `hhg.univariate.ind.nulltable`

, for the data sample size), test parameters are taken from `NullTable`

:

(` mmax, mmin`

`, variant,aggregation.type,`

` score.type, nr.atoms`

,...).

If `NullTable`

is left `NULL`

, a null table is generated by a call to `hhg.univariate.ind.nulltable`

using the arguments supplied to this function. Null table is generated with `nr.perm`

repetitions. It is stored in the returned object, under `generated_null_table`

. When testing for multiple hypotheses, one may generate only one null table (using this function or `hhg.univariate.ind.nulltable`

), and use it many times (thus, substantially reducing computation time). Generated null tables hold the distribution of statistics for both combination types, (`comb.type=='MinP'`

and `comb.type=='Fisher'`

).

If `X`

is supplied with a statistic (`UnivariateStatistic`

object, returned by `hhg.univariate.ind.stat`

), X must have the statistics (by `m`

), required by either `NullTable`

or the user supplied arguments `mmin`

and `mmax`

. If `X`

has a larger `mmax`

arguemnt than the supplied null table object, `m`

statistics which exceed the null table's `mmax`

are not taken into consideration when computing the combined statistic.

Variant types `"ADP-EQP"`

and `"ADP-EQP-ML"`

, are the computationally efficient versions of the `"ADP"`

and `"ADP-ML"`

. EQP type variants reduce calculation time by summing over a subset of partitions, where a split between cells may be performed only every *n/nr.atoms* observations. This allows for a complexity of O(nr.atoms^4). These variants are only available for `aggregation.type=='sum'`

type aggregation.

Null tables may be compressed, using the `compress`

argument. For each of the partition sizes (i.e. `m`

or `mXm`

), the null distribution is held at a `compress.p0`

resolution up to the `compress.p`

percentile. Beyond that value, the distribution is held at a finer resolution defined by `compress.p1`

(since higher values are attained when a relation exists in the data, this is required for computing the p-value accurately.)

For large data (n>100), it is recommended to used `Fast.independence.test`

, which is an optimized version of the `hhg.univariate.ind.stat`

and `hhg.univariate.ind.combined.test`

tests.

Returns a `UnivariateStatistic`

class object, with the following entries:

`MinP` |
The test statistic when the combining type is |

`MinP.pvalue` |
The p-value when the combining type is |

`MinP.m.chosen` |
The partition size m for which the p-value was the smallest. |

`Fisher` |
The test statistic when the combining type is |

`Fisher.pvalue` |
The p-value when the combining type is |

`m.stats` |
The statistic for each m in the range |

`pvalues.of.single.m` |
The p-values for each m in the range |

`generated_null_table` |
The null table object. Null if |

`stat.type` |
"Independence-Combined" |

`variant` |
a character string specifying the partition type used in the test, one of |

`aggregation.type` |
a character string specifying the aggregation type used in the , one of |

`score.type` |
a character string specifying the score typeused in the test, one of |

`mmax` |
The maximum partition size of the ranked observations used for MinP or Fisher test statistic. |

`mmin` |
The minimum partition size of the ranked observations used for MinP or Fisher test statistic. |

`w.sum` |
The minimum number of observations in a partition, only relevant for |

`w.max` |
The minimum number of observations in a partition, only relevant for |

`nr.atoms` |
The input |

Barak Brill and Shachar Kaufman.

Heller, R., Heller, Y., Kaufman S., Brill B, & Gorfine, M. (2016). Consistent Distribution-Free K-Sample and Independence Tests for Univariate Random Variables, JMLR 17(29):1-54

Brill B. (2016) Scalable Non-Parametric Tests of Independence (master's thesis)

http://primage.tau.ac.il/libraries/theses/exeng/free/2899741.pdf

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | ```
## Not run:
N = 35
data = hhg.example.datagen(N, 'Parabola')
X = data[1,]
Y = data[2,]
plot(X,Y)
#I) Perform MinP & Fisher Tests - without existing null tables.
#Null tables are generated by the test function.
#using partitions sizes up to 5
results = hhg.univariate.ind.combined.test(X,Y,nr.perm = 100,mmax=5)
results
#The null table can then be accessed.
generated.null.table = results$generated_null_table
#II) Perform MinP & Fisher Tests - with existing null tables.
#create null table for aggregation by summation (on ADP), with partitions sizes up to 5:
ADP.null = hhg.univariate.ind.nulltable(N,mmax=5)
#create a null table, using aggregation by summation over DDP partitions,
#with partitions sizes up to 5, over Pearson scores,
#with 1000 bootstrap repetitions.
DDP.null = hhg.univariate.ind.nulltable(N,mmax = 5,variant = 'DDP',
score.type = 'Pearson', nr.replicates = 1000)
MinP.ADP.existing.null.table = hhg.univariate.ind.combined.test(X,Y, NullTable = ADP.null)
#Results
MinP.ADP.existing.null.table
#using the other null table (DDP variant, with pearson scores):
MinP.DDP.existing.null.table = hhg.univariate.ind.combined.test(X,Y, NullTable = DDP.null)
MinP.DDP.existing.null.table
# combined test can also be performed by using the test statistic.
ADP.statistic = hhg.univariate.ind.stat(X,Y,mmax=5)
MinP.using.statistic.result = hhg.univariate.ind.combined.test(ADP.statistic,
NullTable = ADP.null)
# same result as above (as MinP.ADP.result.using.existing.null.table$MinP.pvalue)
MinP.using.statistic.result$MinP.pvalue
#III) Perform MinP & Fisher Tests - using the efficient variants for large N.
N_Large = 1000
data_Large = hhg.example.datagen(N_Large, 'W')
X_Large = data_Large[1,]
Y_Large = data_Large[2,]
plot(X_Large,Y_Large)
NullTable_for_N_Large_MXM_tables = hhg.univariate.ind.nulltable(N_Large,
variant = 'ADP-EQP',nr.atoms = 30,nr.replicates=200)
NullTable_for_N_Large_MXL_tables = hhg.univariate.ind.nulltable(N_Large,
variant = 'ADP-EQP-ML',nr.atoms = 30,nr.replicates=200)
ADP_EQP_Result = hhg.univariate.ind.combined.test(X_Large,Y_Large,
NullTable_for_N_Large_MXM_tables)
ADP_EQP_ML_Result = hhg.univariate.ind.combined.test(X_Large,Y_Large,
NullTable_for_N_Large_MXL_tables)
ADP_EQP_Result
ADP_EQP_ML_Result
## End(Not run)
``` |

