Description Usage Arguments Details Value Author(s) References Examples

View source: R/HHG_univariate.R

Performs distribution-free tests for equality of a univariate distribution across K groups.

1 2 3 4 5 | ```
hhg.univariate.ks.combined.test(X,Y=NULL,NullTable=NULL,mmin=2,
mmax=ifelse(is.null(Y),4,max(4,round(min(table(Y))/3))), aggregation.type='sum',
score.type='LikelihoodRatio' ,combining.type='MinP',nr.perm=1000,
variant='KSample-Variant', nr.atoms = nr_bins_equipartition(length(X)),
compress=F,compress.p0=0.001,compress.p=0.99,compress.p1=0.000001,keep.simulation.data=T)
``` |

`X` |
A numeric vector of data values (tied observations are broken at random), or the test statistic as output from |

`Y` |
for |

`NullTable` |
The null table of the statistic, which can be downloaded from the software website or computed by the function |

`mmin` |
The minimum partition size of the ranked observations, default value is 2. Ignored if |

`mmax` |
The maximum partition size of the ranked observations, default value is 1/3 the number of observations in the smallest group. Ignored if |

`aggregation.type` |
a character string specifying the aggregation type, must be one of |

`score.type` |
a character string specifying the score type, must be one of |

`combining.type` |
a character string specifying the combining type, must be one of |

`nr.perm` |
The number of permutations for the null distribution. Ignored if |

`variant` |
Default value is |

`nr.atoms` |
If |

`compress` |
a logical variable indicating whether you want to compress the null tables. If TRUE, the lower |

`compress.p0` |
Parameter for compression. This is the resolution for the lower |

`compress.p` |
Parameter for compression. Part of the null distribution to compress. |

`compress.p1` |
Parameter for compression. This is the resolution for the upper value of the null distribution. |

`keep.simulation.data` |
a logical variable indicating whether in addition to the sorted statistics per column, the original matrix of size nr.replicates by mmax-mmin+1 is also stored.Ignored if |

The function outputs test statistics and p-values of the combined omnibus distribution-free test of equality of distributions among K groups, as described in Heller et al. (2014). The test combines statistics from a range of partition sizes.
The default combining type is the minimum p-value, so the test statistic is the minimum p-value over the range of partition sizes m from `mmin`

to `mmax`

, where the p-value for a fixed partition size m is defined by the aggregation type and score type. The second type of combination method for statistics, is via a Fisher type statistic, *-Σ log(p_m)* (with the sum going from *mmin* to *mmax*). The returned result may include the test statistic for the `MinP`

combination, the `Fisher`

combination, or both (see `comb.type`

).

If the argument `NullTable`

is supplied with a proper null table (constructed using `hhg.univariate.ks.nulltable`

, for the K groups sample sizes), then the following test parameters are taken from `NullTable`

:
(` mmax, mmin`

`, variant, aggregation.type`

`, score.type, nr.atoms`

,...).

If `NullTable`

is left `NULL`

, a null table is generated by a call to `hhg.univariate.ks.nulltable`

using the arguments supplied to this function. The null table is generated with `nr.perm`

repetitions. It is stored in the returned object `generated_null_table`

. When testing for multiple hypotheses with the same group sample sizes, it is computationally efficient to generate only one null table (using this function or `hhg.univariate.ks.nulltable`

), and use it for all hypotehses testsed. Generated null tables hold the distribution of statistics for both combination types, (`comb.type=='MinP'`

and `comb.type=='Fisher'`

).

If `X`

is supplied with a statistic (`UnivariateStatistic`

object, returned by `hhg.univariate.ks.stat`

), X must have the statistics (by `m`

), required by either `NullTable`

or the user supplied arguments `mmin`

and `mmax`

. If `X`

has a larger `mmax`

argument than the supplied null table object, the statistics which exceed the null table's `mmax`

are not taken into consideration when computing the combined statistic.

Variant type `"KSample-Equipartition"`

is the atom based version of the K-sample test. Calculation time is reduced by aggregating over a subset of partitions, where a split between cells may be performed only every *n/nr.atoms* observations. Atom based tests are available when `aggregation.type`

is set to `'sum'`

or `'max'`

.

Null tables may be compressed, using the `compress`

argument. For each of the partition sizes, the null distribution is held at a `compress.p0`

resolution up to the `compress.p`

percentile. Beyond that value, the distribution is held at a finer resolution defined by `compress.p1`

(since higher values are attained when a relation exists in the data, this is required for computing the p-value accurately in the tail of the null distribution.)

Returns a `UnivariateStatistic`

class object, with the following entries:

`MinP` |
The test statistic when the combining type is |

`MinP.pvalue` |
The p-value when the combining type is |

`MinP.m.chosen` |
The partition size m for which the p-value was the smallest. |

`Fisher` |
The test statistic when the combining type is |

`Fisher.pvalue` |
The p-value when the combining type is |

`m.stats` |
The statistic for each m in the range |

`pvalues.of.single.m` |
The p-values for each m in the range |

`generated_null_table` |
The null table object. Null if |

`stat.type` |
"KSample-Combined" |

`aggregation.type` |
a character string specifying the aggregation type used in the , one of |

`score.type` |
a character string specifying the score typeused in the test, one of |

`mmax` |
The maximum partition size of the ranked observations used for MinP or Fisher test statistic. |

`mmin` |
The minimum partition size of the ranked observations used for MinP or Fisher test statistic. |

`nr.atoms` |
The input |

Barak Brill and Shachar Kaufman.

Heller, R., Heller, Y., Kaufman S., Brill B, & Gorfine, M. (2016). Consistent Distribution-Free K-Sample and Independence Tests for Univariate Random Variables, JMLR 17(29):1-54 https://www.jmlr.org/papers/volume17/14-441/14-441.pdf

Brill B. (2016) Scalable Non-Parametric Tests of Independence (master's thesis) http://primage.tau.ac.il/libraries/theses/exeng/free/2899741.pdf

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 | ```
## Not run:
#Two groups, each from a different normal mixture:
N0=30
N1=30
X = c(c(rnorm(N0/2,-2,0.7),rnorm(N0/2,2,0.7)),c(rnorm(N1/2,-1.5,0.5),rnorm(N1/2,1.5,0.5)))
Y = (c(rep(0,N0),rep(1,N1)))
plot(Y,X)
#I) Perform MinP & Fisher Tests - without existing null tables.
#Null tables are generated by the test function.
results = hhg.univariate.ks.combined.test(X,Y,nr.perm = 100)
results
#The null table can then be accessed.
generated.null.table = results$generated_null_table
#II)Perform MinP & Fisher Tests - with existing null tables.
#null table for aggregation by summation:
sum.nulltable = hhg.univariate.ks.nulltable(c(N0,N1), nr.replicates=1000)
MinP.Sm.existing.null.table = hhg.univariate.ks.combined.test(X,Y,
NullTable = sum.nulltable)
#Results
MinP.Sm.existing.null.table
# combined test can also be performed by using the test statistic.
Sm.statistic = hhg.univariate.ks.stat(X,Y)
MinP.using.statistic = hhg.univariate.ks.combined.test(Sm.statistic,
NullTable = sum.nulltable)
# same result as above
MinP.using.statistic$MinP.pvalue
#null table for aggregation by maximization:
max.nulltable = hhg.univariate.ks.nulltable(c(N0,N1), aggregation.type = 'max',
score.type='LikelihoodRatio', mmin = 2, mmax = 10, nr.replicates = 100)
#combined test using both "MinP" and "Fisher":
MinPFisher.Mm.result = hhg.univariate.ks.combined.test(X,Y,NullTable = max.nulltable ,
combining.type = 'Both')
MinPFisher.Mm.result
#III) Perform MinP & Fisher Tests for extremly large n
#Two groups, each from a different normal mixture, total sample size is 10^4:
X_Large = c(c(rnorm(2500,-2,0.7),rnorm(2500,2,0.7)),
c(rnorm(2500,-1.5,0.5),rnorm(2500,1.5,0.5)))
Y_Large = (c(rep(0,5000),rep(1,5000)))
plot(Y_Large,X_Large)
N0_large = 5000
N1_large = 5000
Sm.EQP.null.table = hhg.univariate.ks.nulltable(c(N0_large,N1_large), nr.replicates=200,
variant = 'KSample-Equipartition', mmax = 30)
Mm.EQP.null.table = hhg.univariate.ks.nulltable(c(N0_large,N1_large), nr.replicates=200,
aggregation.type='max', variant = 'KSample-Equipartition', mmax = 30)
MinPFisher.Sm.EQP.result = hhg.univariate.ks.combined.test(X_Large, Y_Large,
NullTable = Sm.EQP.null.table ,
combining.type = 'Both')
MinPFisher.Sm.EQP.result
MinPFisher.Mm.EQP.result = hhg.univariate.ks.combined.test(X_Large, Y_Large,
NullTable = Mm.EQP.null.table ,
combining.type = 'Both')
MinPFisher.Mm.EQP.result
## End(Not run)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.