Description Usage Arguments Details Value Author(s) References Examples

View source: R/HHG_univariate.R

These statistics are used in the omnibus distribution-free test of independence between two univariate random variables, as described in Heller et al. (2016).

1 2 3 | ```
hhg.univariate.ind.stat(x, y, variant = 'ADP',aggregation.type='sum',
score.type='LikelihoodRatio', mmax = max(floor(sqrt(length(x))/2),2),
mmin =2, w.sum = 0, w.max = 2,nr.atoms = nr_bins_equipartition(length(x)))
``` |

`x` |
a numeric vector with observed |

`y` |
a numeric vector with observed |

`variant` |
a character string specifying the partition type, must be one of |

`aggregation.type` |
a character string specifying the aggregation type, must be one of |

`score.type` |
a character string specifying the score type, must be one of |

`mmax` |
The partition size of the ranked observations. The default size is half the square root of the number of observations |

`mmin` |
The partition size of the ranked observations. The default size is half the square root of the number of observations |

`w.sum` |
The minimum number of observations in a partition, only relevant for |

`w.max` |
The minimum number of observations in a partition, only relevant for |

`nr.atoms` |
For |

For each partition size *m= mmin,…,mmax*, the function computes the scores in each of the partitions (according to score type), and aggregates all scores according to the aggregation type (see details in Heller et al. , 2014). If the score type is one of `"LikelihoodRatio"`

or `"Pearson"`

, and the aggregation type is one of `"sum"`

or `"max"`

, then the computed statistic will be in `statistic`

, otherwise the computed statistics will be in the appropriate subset of `sum.chisq`

, `sum.lr`

, `max.chisq`

, and `max.lr`

. Note that if the variant is `"ADP"`

, all partition sizes are computed together in O(N^4), so the score computational complexity is O(N^4). For `"DDP"`

and mmax>4,the score computational complexity is O(N^4)*(mmax-mmin+1).

For the 'sum' aggregation type (default), The test statistic is the sum of log likelihood (or Pearson Chi-square) scores, of all partitions of size *m X m* of the data, normalized by the number of partitions and the data size (thus, being an estimator of the Mutual Information). For the 'max' aggregation type, the test statistic is the maximum log likelihood (or Pearson Chi-square) score acheived by a partition of data of size `m`

, normalized by the data size. For variant type `"ADP-ML"`

, the statistics calculated include not only the sum over *mXm* tables (symmetric tables, same number of cells on each axis), but also assymetric tables (i.e. *mXl* tables).

Variant types `"ADP-EQP"`

and `"ADP-EQP-ML"`

, are the computationally efficient versions of the `"ADP"`

and `"ADP-ML"`

. EQP type variants reduce calculation time by summing over a subset of partitions, where a split between cells may be performed only every *n/nr.atoms* observations. This allows for a complexity of O(nr.atoms^4). These variants are only available for `aggregation.type=='sum'`

type aggregation.

For large data (n>100), it is recommended to used `Fast.independence.test`

, which is an optimized version of the `hhg.univariate.ind.stat`

and `hhg.univariate.ind.combined.test`

tests.

Returns a `UnivariateStatistic`

class object, with the following entries:

`statistic` |
The value of the computed statistic if the score type is one of |

`sum.chisq` |
A vector of size |

`sum.lr` |
A vector of size |

`max.chisq` |
A vector of size |

`max.lr` |
A vector of size |

`type` |
"Independence" |

`stat.type` |
"Independence-Stat" |

`size` |
The sample size |

`score.type` |
The input |

`aggregation.type` |
The input |

`mmin` |
The input |

`mmax` |
The input |

`additional` |
A vector with the input |

`nr.atoms` |
The input |

Barak Brill and Shachar Kaufman.

Heller, R., Heller, Y., Kaufman S., Brill B, & Gorfine, M. (2016). Consistent Distribution-Free K-Sample and Independence Tests for Univariate Random Variables, JMLR 17(29):1-54

Brill B. (2016) Scalable Non-Parametric Tests of Independence (master's thesis)

http://primage.tau.ac.il/libraries/theses/exeng/free/2899741.pdf

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | ```
## Not run:
N = 35
data = hhg.example.datagen(N, 'Parabola')
X = data[1,]
Y = data[2,]
plot(X,Y)
#I) Computing test statistics , with default parameters(ADP statistic):
hhg.univariate.ADP.Likelihood.result = hhg.univariate.ind.stat(X,Y)
hhg.univariate.ADP.Likelihood.result
#II) Computing test statistics , with summation over Data Derived Partitions (DDP),
#using Pearson scores, and partition sizes up to 5:
hhg.univariate.DDP.Pearson.result = hhg.univariate.ind.stat(X,Y,variant = 'DDP',
score.type = 'Pearson', mmax = 5)
hhg.univariate.DDP.Pearson.result
#III) Computing test statistics, for all M X L tables:
hhg.univariate.ADP.ML.Likelihood.result = hhg.univariate.ind.stat(X,Y,
variant='ADP-ML', mmax = 5)
hhg.univariate.ADP.ML.Likelihood.result
#IV) Computing test statistics, using efficient variants (for large data sets):
#Note : for independence testing with n>100, Fast.ADP.test is suggested
#rather than hhg.univariate.ind.stat.
N_Large = 1000
data_Large = hhg.example.datagen(N_Large, 'W')
X_Large = data_Large[1,]
Y_Large = data_Large[2,]
plot(X_Large,Y_Large)
hhg.univariate.ADP.EQP.Likelihood.result = hhg.univariate.ind.stat(X_Large
,Y_Large,variant='ADP-EQP', mmax = 20)
hhg.univariate.ADP.EQP.Likelihood.result
#note how only nr.atoms=76 are used - only 75 possible cell split locations are
#taken into consideration when computing the sum over all possible log likelihood scores.
#this can be changed using the nr.atoms argument:
hhg.univariate.ADP.EQP.Likelihood.result = hhg.univariate.ind.stat(X_Large,Y_Large,
variant='ADP-EQP',mmax = 20, nr.atoms =100)
hhg.univariate.ADP.EQP.Likelihood.result
#V) Computing the efficient sum over all MXL tables:
hhg.univariate.ADP.EQP.ML.Likelihood.result = hhg.univariate.ind.stat(X_Large,Y_Large,
variant='ADP-EQP-ML',mmax = 5)
hhg.univariate.ADP.EQP.ML.Likelihood.result
## End(Not run)
``` |

HHG documentation built on Nov. 17, 2017, 7:07 a.m.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.