eqdist.etest | R Documentation |
Performs the nonparametric multisample E-statistic (energy) test for equality of multivariate distributions.
eqdist.etest(x, sizes, distance = FALSE,
method=c("original","discoB","discoF"), R)
eqdist.e(x, sizes, distance = FALSE,
method=c("original","discoB","discoF"))
ksample.e(x, sizes, distance = FALSE,
method=c("original","discoB","discoF"), ix = 1:sum(sizes))
x |
data matrix of pooled sample |
sizes |
vector of sample sizes |
distance |
logical: if TRUE, first argument is a distance matrix |
method |
use original (default) or distance components (discoB, discoF) |
R |
number of bootstrap replicates |
ix |
a permutation of the row indices of x |
The k-sample multivariate \mathcal{E}
-test of equal distributions
is performed. The statistic is computed from the original
pooled samples, stacked in matrix x
where each row
is a multivariate observation, or the corresponding distance matrix. The
first sizes[1]
rows of x
are the first sample, the next
sizes[2]
rows of x
are the second sample, etc.
The test is implemented by nonparametric bootstrap, an approximate
permutation test with R
replicates.
The function eqdist.e
returns the test statistic only; it simply
passes the arguments through to eqdist.etest
with R = 0
.
The k-sample multivariate \mathcal{E}
-statistic for testing equal distributions
is returned. The statistic is computed from the original pooled samples, stacked in
matrix x
where each row is a multivariate observation, or from the distance
matrix x
of the original data. The
first sizes[1]
rows of x
are the first sample, the next
sizes[2]
rows of x
are the second sample, etc.
The two-sample \mathcal{E}
-statistic proposed by
Szekely and Rizzo (2004)
is the e-distance e(S_i,S_j)
, defined for two samples S_i, S_j
of size n_i, n_j
by
e(S_i,S_j)=\frac{n_i n_j}{n_i+n_j}[2M_{ij}-M_{ii}-M_{jj}],
where
M_{ij}=\frac{1}{n_i n_j}\sum_{p=1}^{n_i} \sum_{q=1}^{n_j}
\|X_{ip}-X_{jq}\|,
\|\cdot\|
denotes Euclidean norm, and X_{ip}
denotes the p-th observation in the i-th sample.
The original (default method) k-sample
\mathcal{E}
-statistic is defined by summing the pairwise e-distances over
all k(k-1)/2
pairs
of samples:
\mathcal{E}=\sum_{1 \leq i < j \leq k} e(S_i,S_j).
Large values of \mathcal{E}
are significant.
The discoB
method computes the between-sample disco statistic.
For a one-way analysis, it is related to the original statistic as follows.
In the above equation, the weights \frac{n_i n_j}{n_i+n_j}
are replaced with
\frac{n_i + n_j}{2N}\frac{n_i n_j}{n_i+n_j} =
\frac{n_i n_j}{2N}
where N is the total number of observations: N=n_1+...+n_k
.
The discoF
method is based on the disco F ratio, while the discoB
method is based on the between sample component.
Also see disco
and disco.between
functions.
A list with class htest
containing
method |
description of test |
statistic |
observed value of the test statistic |
p.value |
approximate p-value of the test |
data.name |
description of data |
eqdist.e
returns test statistic only.
The pairwise e-distances between samples can be conveniently
computed by the edist
function, which returns a dist
object.
Maria L. Rizzo mrizzo@bgsu.edu and Gabor J. Szekely
Szekely, G. J. and Rizzo, M. L. (2004) Testing for Equal Distributions in High Dimension, InterStat, November (5).
M. L. Rizzo and G. J. Szekely (2010).
DISCO Analysis: A Nonparametric Extension of
Analysis of Variance, Annals of Applied Statistics,
Vol. 4, No. 2, 1034-1055.
\Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/09-AOAS245")}
Szekely, G. J. (2000) Technical Report 03-05:
\mathcal{E}
-statistics: Energy of
Statistical Samples, Department of Mathematics and Statistics, Bowling
Green State University.
ksample.e
,
edist
,
disco
,
disco.between
,
energy.hclust
.
data(iris)
## test if the 3 varieties of iris data (d=4) have equal distributions
eqdist.etest(iris[,1:4], c(50,50,50), R = 199)
## example that uses method="disco"
x <- matrix(rnorm(100), nrow=20)
y <- matrix(rnorm(100), nrow=20)
X <- rbind(x, y)
d <- dist(X)
# should match edist default statistic
set.seed(1234)
eqdist.etest(d, sizes=c(20, 20), distance=TRUE, R = 199)
# comparison with edist
edist(d, sizes=c(20, 10), distance=TRUE)
# for comparison
g <- as.factor(rep(1:2, c(20, 20)))
set.seed(1234)
disco(d, factors=g, distance=TRUE, R=199)
# should match statistic in edist method="discoB", above
set.seed(1234)
disco.between(d, factors=g, distance=TRUE, R=199)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.