hhg.univariate.ind.combined.test: Distribution-free test of independence
In HHG: Heller-Heller-Gorfine Tests of Independence and Equality of Distributions

Description Usage Arguments Details Value Author(s) References Examples

Performs distribution-free tests for independence of two univariate random variables.

hhg.univariate.ind.combined.test(X,Y=NULL,NullTable=NULL,mmin=2,
mmax=max(floor(sqrt(length(X))/2),2),variant='ADP',aggregation.type='sum',
score.type='LikelihoodRatio', w.sum = 0, w.max = 2 ,combining.type='MinP',
nr.perm=100,nr.atoms = nr_bins_equipartition(length(X)),
compress=F,compress.p0=0.001,compress.p=0.99,compress.p1=0.000001,
keep.simulation.data=T)

`X`	a numeric vector with observed `X` values, or the test statistic as output from `hhg.univariate.ind.stat`.
`Y`	a numeric vector with observed `Y` values. Leave as Null if the input to `X` is the test statistic.
`NullTable`	The null table of the statistic, which can be downloaded from the software website or computed by the function `hhg.univariate.ind.nulltable`.
`mmin`	The minimum partition size of the ranked observations, default value is 2. Ignored if `NullTable` is non-null.
`mmax`	The maximum partition size of the ranked observations, default value is half the square root of the number of observations. For a max aggregation.type, this parameter cannot be more than 2 for the ADP variant and 4 for DDP variant. Ignored if `NullTable` is non-null.
`variant`	a character string specifying the partition type, must be one of `"ADP"` (default) or `"DDP"`, `"ADP-ML"`, `"ADP-EQP"`, `"ADP-EQP-ML"`. Ignored if `NullTable` is non-null.
`aggregation.type`	a character string specifying the aggregation type, must be one of `"sum"` (default), `"max"`, or `"both".`. Ignored if `NullTable` is non-null.
`score.type`	a character string specifying the score type, must be one of `"LikelihoodRatio"` (default), `"Pearson"`, or `"both"`. Ignored if `NullTable` is non-null.
`w.sum`	The minimum number of observations in a partition, only relevant for `type="Independence"`, `aggregation.type="Sum"` and `score.type="Pearson"`, default value 0. Ignored if `NullTable` is non-null.
`w.max`	The minimum number of observations in a partition, only relevant for `type="Independence"`, `aggregation.type="Max"` and `score.type="Pearson"`, default value 2. Ignored if `NullTable` is non-null.
`combining.type`	a character string specifying the combining type, must be one of `"MinP"` (default), `"Fisher"`, or `"both"`.
`nr.perm`	The number of permutations for the null distribution. Ignored if `NullTable` is non-null.
`nr.atoms`	For `"ADP-EQP"` and `"ADP-EQP-ML"` type tests, sets the number of possible split points in the data. Ignored if `NullTable` is non-null. The default value is the minimum between n and 60+0.5√{n}*.
`compress`	TRUE or FALSE. If enabled, null tables are compressed: The lower `compress.p` part of the null statistics is kept at a `compress.p0` resolution, while the upper part is kept at a `compress.p1` resolution (which is finer).
`compress.p0`	Parameter for compression. This is the resolution for the lower `compress.p` part of the null distribution.
`compress.p`	Parameter for compression. Part of the null distribution to compress.
`compress.p1`	Parameter for compression. This is the resolution for the upper value of the null distribution.
`keep.simulation.data`	TRUE/FALSE. If TRUE, then in addition to the sorted statistics per column, the original matrix of size nr.replicates by mmax-mmin+1 is also stored.

The test statistic and p-value of the recommended independence test between two univariate random variables in Heller et al. (2014). The default combining type in the minimum p-value, so the test statistic is the minimum p-value over the range of partition sizes m from mmin to mmax, where the p-value for a fixed partition size m is defined by the aggregation type and score type. The combination is done over the statistics computed by hhg.univariate.ind.stat. The second type of combination method for statistics, is via a Fisher type statistic, -Σ log(p_m) (with the sum going from mmin to mmax). The returned result may include the test statistic for the MinP combination, the Fisher combination, or both (see comb.type).

If the argument NullTable is supplied with a proper null table (constructed using hhg.univariate.ind.nulltable, for the data sample size), test parameters are taken from NullTable:

( mmax, mmin, variant,aggregation.type, score.type, nr.atoms ,...).

If NullTable is left NULL, a null table is generated by a call to hhg.univariate.ind.nulltable using the arguments supplied to this function. Null table is generated with nr.perm repetitions. It is stored in the returned object, under generated_null_table. When testing for multiple hypotheses, one may generate only one null table (using this function or hhg.univariate.ind.nulltable), and use it many times (thus, substantially reducing computation time). Generated null tables hold the distribution of statistics for both combination types, (comb.type=='MinP' and comb.type=='Fisher').

If X is supplied with a statistic (UnivariateStatistic object, returned by hhg.univariate.ind.stat), X must have the statistics (by m), required by either NullTable or the user supplied arguments mmin and mmax. If X has a larger mmax arguemnt than the supplied null table object, m statistics which exceed the null table's mmax are not taken into consideration when computing the combined statistic.

Variant types "ADP-EQP" and "ADP-EQP-ML", are the computationally efficient versions of the "ADP" and "ADP-ML". EQP type variants reduce calculation time by summing over a subset of partitions, where a split between cells may be performed only every n/nr.atoms observations. This allows for a complexity of O(nr.atoms^4). These variants are only available for aggregation.type=='sum' type aggregation.

Null tables may be compressed, using the compress argument. For each of the partition sizes (i.e. m or mXm), the null distribution is held at a compress.p0 resolution up to the compress.p percentile. Beyond that value, the distribution is held at a finer resolution defined by compress.p1 (since higher values are attained when a relation exists in the data, this is required for computing the p-value accurately.)

For large data (n>100), it is recommended to used Fast.independence.test, which is an optimized version of the hhg.univariate.ind.stat and hhg.univariate.ind.combined.test tests.

Returns a UnivariateStatistic class object, with the following entries:

`MinP`	The test statistic when the combining type is `"MinP"`.
`MinP.pvalue`	The p-value when the combining type is `"MinP"`.
`MinP.m.chosen`	The partition size m for which the p-value was the smallest.
`Fisher`	The test statistic when the combining type is `"Fisher"`.
`Fisher.pvalue`	The p-value when the combining type is `"Fisher"`.
`m.stats`	The statistic for each m in the range `mmin` to `mmax`.
`pvalues.of.single.m`	The p-values for each m in the range `mmin` to `mmax`.
`generated_null_table`	The null table object. Null if `NullTable` is non-null.
`stat.type`	"Independence-Combined"
`variant`	a character string specifying the partition type used in the test, one of `"ADP"` or `"DDP"`.
`aggregation.type`	a character string specifying the aggregation type used in the , one of `"sum"` or `"max"`.
`score.type`	a character string specifying the score typeused in the test, one of `"LikelihoodRatio"` or `"Pearson"`.
`mmax`	The maximum partition size of the ranked observations used for MinP or Fisher test statistic.
`mmin`	The minimum partition size of the ranked observations used for MinP or Fisher test statistic.
`w.sum`	The minimum number of observations in a partition, only relevant for `type="Independence"`, `aggregation.type="Sum"` and `score.type="Pearson"`.
`w.max`	The minimum number of observations in a partition, only relevant for `type="Independence"`, `aggregation.type="Max"` and `score.type="Pearson"`.
`nr.atoms`	The input `nr.atoms`.

Barak Brill and Shachar Kaufman.

Heller, R., Heller, Y., Kaufman S., Brill B, & Gorfine, M. (2016). Consistent Distribution-Free K-Sample and Independence Tests for Univariate Random Variables, JMLR 17(29):1-54

Brill B. (2016) Scalable Non-Parametric Tests of Independence (master's thesis)

http://primage.tau.ac.il/libraries/theses/exeng/free/2899741.pdf

## Not run: 

N = 35
data = hhg.example.datagen(N, 'Parabola')
X = data[1,]
Y = data[2,]
plot(X,Y)

#I) Perform MinP & Fisher Tests - without existing null tables.
#Null tables are generated by the test function.
#using partitions sizes up to 5
results = hhg.univariate.ind.combined.test(X,Y,nr.perm = 100,mmax=5)
results


#The null table can then be accessed.
generated.null.table = results$generated_null_table


#II) Perform MinP & Fisher Tests - with existing null tables. 

#create null table for aggregation by summation (on ADP), with partitions sizes up to 5: 
ADP.null = hhg.univariate.ind.nulltable(N,mmax=5)

#create a null table, using aggregation by summation over DDP partitions,
#with partitions sizes up to 5, over Pearson scores,
#with 1000 bootstrap repetitions.
DDP.null = hhg.univariate.ind.nulltable(N,mmax = 5,variant = 'DDP',
score.type = 'Pearson', nr.replicates = 1000)

MinP.ADP.existing.null.table = hhg.univariate.ind.combined.test(X,Y, NullTable = ADP.null)
#Results 
MinP.ADP.existing.null.table

#using the other null table (DDP variant, with pearson scores):
MinP.DDP.existing.null.table = hhg.univariate.ind.combined.test(X,Y, NullTable = DDP.null)

MinP.DDP.existing.null.table

# combined test can also be performed by using the test statistic.
ADP.statistic = hhg.univariate.ind.stat(X,Y,mmax=5)
MinP.using.statistic.result = hhg.univariate.ind.combined.test(ADP.statistic,
NullTable = ADP.null)
# same result as above (as MinP.ADP.result.using.existing.null.table$MinP.pvalue)
MinP.using.statistic.result$MinP.pvalue

#III) Perform MinP & Fisher Tests - using the efficient variants for large N. 

N_Large = 1000
data_Large = hhg.example.datagen(N_Large, 'W')
X_Large = data_Large[1,]
Y_Large = data_Large[2,]
plot(X_Large,Y_Large)

NullTable_for_N_Large_MXM_tables = hhg.univariate.ind.nulltable(N_Large,
variant = 'ADP-EQP',nr.atoms = 30,nr.replicates=200)
NullTable_for_N_Large_MXL_tables = hhg.univariate.ind.nulltable(N_Large,
variant = 'ADP-EQP-ML',nr.atoms = 30,nr.replicates=200)

ADP_EQP_Result = hhg.univariate.ind.combined.test(X_Large,Y_Large,
NullTable_for_N_Large_MXM_tables)
ADP_EQP_ML_Result = hhg.univariate.ind.combined.test(X_Large,Y_Large,
NullTable_for_N_Large_MXL_tables)

ADP_EQP_Result
ADP_EQP_ML_Result


## End(Not run)