Description Usage Arguments Details Value Note Author(s) References See Also Examples

Performs the nonparametric multisample E-statistic (energy) test for equality of multivariate distributions.

1 2 3 4 5 6 |

`x` |
data matrix of pooled sample |

`sizes` |
vector of sample sizes |

`distance` |
logical: if TRUE, first argument is a distance matrix |

`method` |
use original (default) or distance components (discoB, discoF) |

`R` |
number of bootstrap replicates |

`ix` |
a permutation of the row indices of x |

The k-sample multivariate *E*-test of equal distributions
is performed. The statistic is computed from the original
pooled samples, stacked in matrix `x`

where each row
is a multivariate observation, or the corresponding distance matrix. The
first `sizes[1]`

rows of `x`

are the first sample, the next
`sizes[2]`

rows of `x`

are the second sample, etc.

The test is implemented by nonparametric bootstrap, an approximate
permutation test with `R`

replicates.

The function `eqdist.e`

returns the test statistic only; it simply
passes the arguments through to `eqdist.etest`

with `R = 0`

.

The k-sample multivariate *E*-statistic for testing equal distributions
is returned. The statistic is computed from the original pooled samples, stacked in
matrix `x`

where each row is a multivariate observation, or from the distance
matrix `x`

of the original data. The
first `sizes[1]`

rows of `x`

are the first sample, the next
`sizes[2]`

rows of `x`

are the second sample, etc.

The two-sample *E*-statistic proposed by
Szekely and Rizzo (2004)
is the e-distance *e(S_i,S_j)*, defined for two samples *S_i, S_j*
of size *n_i, n_j* by

*e(S_i, S_j) = (n_i n_j)(n_i+n_j)[2M_(ij)-M_(ii)-M_(jj)],*

where

*
M_{ij} = 1/(n_i n_j) sum[1:n_i, 1:n_j] ||X_(ip) - X_(jq)||,*

*|| ||* denotes Euclidean norm, and *
X_(ip)* denotes the p-th observation in the i-th sample.

The original (default method) k-sample
*E*-statistic is defined by summing the pairwise e-distances over
all *k(k-1)/2* pairs
of samples:

*\emph{E} = sum[i<j] e(S_i,S_j).*

Large values of *\emph{E}* are significant.

The `discoB`

method computes the between-sample disco statistic.
For a one-way analysis, it is related to the original statistic as follows.
In the above equation, the weights *n_i n_j/(n_i+n_j)*
are replaced with

*(n_i + n_j)/(2N) n_i n_j/(n_i+n_j) = n_i n_j/(2N)*

where N is the total number of observations: *N=n_1+...+n_k*.

The `discoF`

method is based on the disco F ratio, while the `discoB`

method is based on the between sample component.

Also see `disco`

and `disco.between`

functions.

A list with class `htest`

containing

`method` |
description of test |

`statistic` |
observed value of the test statistic |

`p.value` |
approximate p-value of the test |

`data.name` |
description of data |

`eqdist.e`

returns test statistic only.

The pairwise e-distances between samples can be conveniently
computed by the `edist`

function, which returns a `dist`

object.

Maria L. Rizzo mrizzo @ bgsu.edu and Gabor J. Szekely

Szekely, G. J. and Rizzo, M. L. (2004) Testing for Equal
Distributions in High Dimension, *InterStat*, November (5).

M. L. Rizzo and G. J. Szekely (2010).
DISCO Analysis: A Nonparametric Extension of
Analysis of Variance, Annals of Applied Statistics,
Vol. 4, No. 2, 1034-1055.

http://dx.doi.org/10.1214/09-AOAS245

Szekely, G. J. (2000) Technical Report 03-05:
*E*-statistics: Energy of
Statistical Samples, Department of Mathematics and Statistics, Bowling
Green State University.

`ksample.e`

,
`edist`

,
`disco`

,
`disco.between`

,
`energy.hclust`

.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | ```
data(iris)
## test if the 3 varieties of iris data (d=4) have equal distributions
eqdist.etest(iris[,1:4], c(50,50,50), R = 199)
## example that uses method="disco"
x <- matrix(rnorm(100), nrow=20)
y <- matrix(rnorm(100), nrow=20)
X <- rbind(x, y)
d <- dist(X)
# should match edist default statistic
set.seed(1234)
eqdist.etest(d, sizes=c(20, 20), distance=TRUE, R = 199)
# comparison with edist
edist(d, sizes=c(20, 10), distance=TRUE)
# for comparison
g <- as.factor(rep(1:2, c(20, 20)))
set.seed(1234)
disco(d, factors=g, distance=TRUE, R=199)
# should match statistic in edist method="discoB", above
set.seed(1234)
disco.between(d, factors=g, distance=TRUE, R=199)
``` |

mariarizzo/energy documentation built on Oct. 30, 2018, 3:15 p.m.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.