Description Usage Arguments Details Value Author(s) References Examples
View source: R/HHG_univariate.R
This function converts null test statistics for different partition sizes into the null table object necessary for the computation of p-values efficiently.
1 2 3 4 | hhg.univariate.nulltable.from.mstats(m.stats,minm,maxm,type,variant,
size,score.type,aggregation.type, w.sum = 0, w.max = 2,
keep.simulation.data=F,nr.atoms = nr_bins_equipartition(sum(size)),
compress=F,compress.p0=0.001,compress.p=0.99,compress.p1=0.000001)
|
m.stats |
A matrix with B rows and |
minm |
The minimum partition size of the ranked observations, default value is 2. |
maxm |
The maximum partition size of the ranked observations. |
type |
A character string specifying the test type, must be one of |
variant |
A character string specifying the partition type for the test of independence, must be one of |
size |
The sample size if |
score.type |
a character string specifying the score type, must be one of |
aggregation.type |
a character string specifying the aggregation type, must be one of |
w.sum |
The minimum number of observations in a partition, only relevant for |
w.max |
The minimum number of observations in a partition, only relevant for |
keep.simulation.data |
TRUE/FALSE. |
nr.atoms |
For |
compress |
TRUE or FALSE. If enabled, null tables are compressed: The lower |
.
compress.p0 |
Parameter for compression. This is the resolution for the lower |
compress.p |
Parameter for compression. Part of the null distribution to compress. |
compress.p1 |
Parameter for compression. This is the resolution for the upper value of the null distribution. |
For finding multiple quantiles, the null table object is more efficient than a matrix of a matrix with B rows and maxm
- minm
+1 columns, where each row contains the test statistics for partition sizes m from minm
to maxm
for the sample permutation of the input sample.
Null tables may be compressed, using the compress
argument. For each of the partition sizes (i.e. m
or mXm
), the null distribution is held at a compress.p0
resolution up to the compress.p
quantile. Beyond that value, the distribution is held at a finer resolution defined by compress.p1
(since higher values are attained when a relation exists in the data, this is required for computing the p-value accurately.)
See vignette('HHG')
for a section on how to use this function, for computing a null tables using multiple cores.
m.stats
The input m.stats
if keep.simulation.data=TRUE
univariate.object
A useful format of the null tables for computing p-values efficiently..
Barak Brill and Shachar Kaufman.
Heller, R., Heller, Y., Kaufman S., Brill B, & Gorfine, M. (2016). Consistent Distribution-Free K-Sample and Independence Tests for Univariate Random Variables, JMLR 17(29):1-54
Brill B. (2016) Scalable Non-Parametric Tests of Independence (master's thesis)
http://primage.tau.ac.il/libraries/theses/exeng/free/2899741.pdf
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 | ## Not run:
# 1. Downloading a lookup table from site
# download from site http://www.math.tau.ac.il/~ruheller/Software.html
####################################################################
#using an already ready null table as object (for use in test functions)
#for example, ADP likelihood ratio statistics, for the independence problem,
#for sample size n=300
load('Object-ADP-n_300.Rdata') #=>null.table
#or using a matrix of statistics generated for the null distribution,
#to create your own table.
load('ADP-nullsim-n_300.Rdata') #=>mat
null.table = hhg.univariate.nulltable.from.mstats(m.stats = mat,minm = 2,
maxm = 5,type = 'Independence', variant = 'ADP',size = 300,
score.type = 'LikelihoodRatio',aggregation.type = 'sum')
# 2. generating an independence null table using multiple cores,
#and then compiling to object.
####################################################################
library(parallel)
library(doParallel)
library(foreach)
library(doRNG)
#generate an independence null table
nr.cores = 4 #this is computer dependent
n = 30 #size of independence problem
nr.reps.per.core = 25
mmax =5
score.type = 'LikelihoodRatio'
aggregation.type = 'sum'
variant = 'ADP'
#generating null table of size 4*25
#single core worker function
generate.null.distribution.statistic =function(){
library(HHG)
null.table = matrix(NA,nrow=nr.reps.per.core,ncol = mmax-1)
for(i in 1:nr.reps.per.core){
#note that the statistic is distribution free (based on ranks),
#so creating a null table (for the null distribution)
#is essentially permuting over the ranks
statistic = hhg.univariate.ind.stat(1:n,sample(1:n),
variant = variant,
aggregation.type = aggregation.type,
score.type = score.type,
mmax = mmax)$statistic
null.table[i,]=statistic
}
rownames(null.table)=NULL
return(null.table)
}
#parallelize over cores
cl = makeCluster(nr.cores)
registerDoParallel(cl)
res = foreach(core = 1:nr.cores, .combine = rbind, .packages = 'HHG',
.export=c('variant','aggregation.type','score.type',
'mmax','nr.reps.per.core','n'), .options.RNG=1234) %dorng%
{ generate.null.distribution.statistic() }
stopCluster(cl)
#the null table:
head(res)
#as object to be used:
null.table = hhg.univariate.nulltable.from.mstats(res,minm=2,
maxm = mmax,type = 'Independence',
variant = variant,size = n,score.type = score.type,
aggregation.type = aggregation.type)
#using the null table, checking for dependence in a linear relation
x=rnorm(n)
y=x+rnorm(n)
ADP.test = hhg.univariate.ind.combined.test(x,y,null.table)
ADP.test$MinP.pvalue #pvalue
# 3. generating a k-sample null table using multiple cores
# and then compiling to object.
####################################################################
library(parallel)
library(doParallel)
library(foreach)
library(doRNG)
#generate a k sample null table
nr.cores = 4 #this is computer dependent
n1 = 25 #size of first group
n2 = 25 #size of first group
nr.reps.per.core = 25
mmax =5
score.type = 'LikelihoodRatio'
aggregation.type = 'sum'
#generating null table of size 4*25
#single core worker function
generate.null.distribution.statistic =function(){
library(HHG)
null.table = matrix(NA,nrow=nr.reps.per.core,ncol = mmax-1)
for(i in 1:nr.reps.per.core){
#note that the statistic is distribution free (based on ranks),
#so creating a null table (for the null distribution)
#is essentially permuting over the ranks
statistic = hhg.univariate.ks.stat(1:(n1+n2),sample(c(rep(0,n1),rep(1,n2))),
aggregation.type = aggregation.type,
score.type = score.type,
mmax = mmax)$statistic
null.table[i,]=statistic
}
rownames(null.table)=NULL
return(null.table)
}
#parallelize over cores
cl = makeCluster(nr.cores)
registerDoParallel(cl)
res = foreach(core = 1:nr.cores, .combine = rbind, .packages = 'HHG',
.export=c('n1','n2','aggregation.type','score.type','mmax',
'nr.reps.per.core'), .options.RNG=1234) %dorng%
{generate.null.distribution.statistic()}
stopCluster(cl)
#the null table:
head(res)
#as object to be used:
null.table = hhg.univariate.nulltable.from.mstats(res,minm=2,
maxm = mmax,type = 'KSample',
variant = 'KSample-Variant',size = c(n1,n2),score.type = score.type,
aggregation.type = aggregation.type)
#using the null table, checking for dependence in a case of two distinct samples
x=1:(n1+n2)
y=c(rep(0,n1),rep(1,n2))
Sm.test = hhg.univariate.ks.combined.test(x,y,null.table)
Sm.test$MinP.pvalue #pvalue
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.