Description Usage Arguments Details Value Author(s) References Examples

View source: R/HHG_univariate.R

Functions for creating null table objects for the atoms based omnibus distribution-free test of independence between two univariate random variables.

1 2 3 4 | ```
Fast.independence.test.nulltable(n,mmin=2,mmax=min(10,n),
variant = 'ADP-EQP-ML',nr.atoms = min(40,n),
score.type='LikelihoodRatio',nr.perm=200,compress=T,
compress.p0=0.001, compress.p=0.99, compress.p1=0.000001)
``` |

`n` |
The sample size |

`mmin` |
The minimum partition size of the ranked observations, default value is 2. |

`mmax` |
The maximum partition size of the ranked observations, default value is the minimum between 10 and the data size. |

`variant` |
a character string specifying the partition type, must be one of |

`nr.atoms` |
the number of atoms (i.e., possible split points in the data). Ignored if |

`score.type` |
a character string specifying the score type, must be one of |

`nr.perm` |
The number of permutations for the null distribution. |

`compress` |
a logical variable indicating whether you want to compress the null tables. If TRUE, null tables are compressed: The lower |

`compress.p0` |
Parameter for compression. This is the resolution for the lower |

`compress.p` |
Parameter for compression. Part of the null distribution to compress. |

`compress.p1` |
Parameter for compression. This is the resolution for the upper value of the null distribution. |

In order to compute the null distributions for a test statistic (with a specific aggregation and score type, and all partition sizes), the only necessary information is the sample size, since the test statistic is distribution-free. The accuracy of the quantiles of the null distribution depend on the number of replicates used for constructing the null tables. The necessary accuracy depends on the threshold used for rejection of the null hypotheses.

This function creates an object for efficiently storing the null distribution of the test statistics.
Generated null tables hold the null distribution of statistics for the two combination types, i.e. for `comb.type`

value (`'MinP'`

and `'Fisher'`

), as well as for fixed partition sizes.

Variant types `"ADP-EQP"`

and `"ADP-EQP-ML"`

, are the atom-based generalizations of the `"ADP"`

and `"ADP-ML"`

. EQP type variants reduce calculation time by summing over a subset of partitions, where a split between cells may be performed only every *n/nr.atoms* observations. This allows for a complexity of O(nr.atoms^4). These variants are only available for `aggregation.type=='sum'`

type aggregation.

Null tables may be compressed, using the `compress`

argument. For each of the partition sizes, the null distribution is held at a `compress.p0`

resolution up to the `compress.p`

percentile. Beyond that value, the distribution is held at a finer resolution defined by `compress.p1`

(since higher values are attained when a relation exists in the data, this is required for computing the p-value accurately in the tail of the null distribution).

`m.stats` |
If keep.simulation.data= TRUE, |

`univariate.object` |
A useful format of the null tables for computing p-values efficiently. |

Barak Brill.

Heller, R., Heller, Y., Kaufman S., Brill B, & Gorfine, M. (2016). Consistent Distribution-Free K-Sample and Independence Tests for Univariate Random Variables, JMLR 17(29):1-54 https://www.jmlr.org/papers/volume17/14-441/14-441.pdf

Brill B., Heller Y., and Heller R. (2018) Nonparametric Independence Tests and k-sample Tests for Large Sample Sizes Using Package HHG, R Journal 10.1 https://journal.r-project.org/archive/2018/RJ-2018-008/RJ-2018-008.pdf

Brill B. (2016) Scalable Non-Parametric Tests of Independence (master's thesis) http://primage.tau.ac.il/libraries/theses/exeng/free/2899741.pdf

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | ```
## Not run:
N_Large = 1000
data_Large = hhg.example.datagen(N_Large, 'W')
X_Large = data_Large[1,]
Y_Large = data_Large[2,]
plot(X_Large,Y_Large)
NullTable_for_N_Large_MXM_tables = Fast.independence.test.nulltable(N_Large,
variant = 'ADP-EQP', nr.atoms = 30,nr.perm=200)
NullTable_for_N_Large_MXL_tables = Fast.independence.test.nulltable(N_Large,
variant = 'ADP-EQP-ML', nr.atoms = 30,nr.perm=200)
ADP_EQP_Result = Fast.independence.test(X_Large,Y_Large,
NullTable_for_N_Large_MXM_tables)
ADP_EQP_ML_Result = Fast.independence.test(X_Large,Y_Large,
NullTable_for_N_Large_MXL_tables)
ADP_EQP_Result
ADP_EQP_ML_Result
#null distribution depends only on data size (length(X)),
#so same null table can be used many times.
#For example, another data set:
data_Large = hhg.example.datagen(N_Large, 'Circle')
X_Large = data_Large[1,]
Y_Large = data_Large[2,]
plot(X_Large,Y_Large)
#you may use Fisher type scores:
ADP_EQP_Result = Fast.independence.test(X_Large,Y_Large,
NullTable_for_N_Large_MXM_tables, combining.type='Fisher')
#or both MinP and Fisher:
ADP_EQP_ML_Result = Fast.independence.test(X_Large,Y_Large,
NullTable_for_N_Large_MXL_tables, combining.type='Both')
ADP_EQP_Result
ADP_EQP_ML_Result
## End(Not run)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.