# edist: E-distance In energy: E-Statistics: Multivariate Inference via the Energy of Data

## Description

Returns the E-distances (energy statistics) between clusters.

## Usage

 ```1 2``` ``` edist(x, sizes, distance = FALSE, ix = 1:sum(sizes), alpha = 1, method = c("cluster","discoB")) ```

## Arguments

 `x` data matrix of pooled sample or Euclidean distances `sizes` vector of sample sizes `distance` logical: if TRUE, x is a distance matrix `ix` a permutation of the row indices of x `alpha` distance exponent in (0,2] `method` how to weight the statistics

## Details

A vector containing the pairwise two-sample multivariate E-statistics for comparing clusters or samples is returned. The e-distance between clusters is computed from the original pooled data, stacked in matrix `x` where each row is a multivariate observation, or from the distance matrix `x` of the original data, or distance object returned by `dist`. The first `sizes` rows of the original data matrix are the first sample, the next `sizes` rows are the second sample, etc. The permutation vector `ix` may be used to obtain e-distances corresponding to a clustering solution at a given level in the hierarchy.

The default method `cluster` summarizes the e-distances between clusters in a table. The e-distance between two clusters C_i, C_j of size n_i, n_j proposed by Szekely and Rizzo (2005) is the e-distance e(C_i,C_j), defined by

e(S_i, S_j) = (n_i n_j)/(n_i+n_j)[2M_(ij)-M_(ii)-M_(jj)],

where

M_{ij} = 1/(n_i n_j) sum[1:n_i, 1:n_j] ||X_(ip) - X_(jq)||^a,

|| || denotes Euclidean norm, a= `alpha`, and X_(ip) denotes the p-th observation in the i-th cluster. The exponent `alpha` should be in the interval (0,2].

The coefficient (n_i n_j)(n_i+n_j) is one-half of the harmonic mean of the sample sizes. The `discoB` method is related but with different ways of summarizing the pairwise differences between samples. The `disco` methods apply the coefficient (n_i n_j)/(2N) where N is the total number of observations. This weights each (i,j) statistic by sample size relative to N. See the `disco` topic for more details.

## Value

A object of class `dist` containing the lower triangle of the e-distance matrix of cluster distances corresponding to the permutation of indices `ix` is returned. The `method` attribute of the distance object is assigned a value of type, index.

## Author(s)

Maria L. Rizzo mrizzo @ bgsu.edu and Gabor J. Szekely

## References

Szekely, G. J. and Rizzo, M. L. (2005) Hierarchical Clustering via Joint Between-Within Distances: Extending Ward's Minimum Variance Method, Journal of Classification 22(2) 151-183.
doi: 10.1007/s00357-005-0012-9

M. L. Rizzo and G. J. Szekely (2010). DISCO Analysis: A Nonparametric Extension of Analysis of Variance, Annals of Applied Statistics, Vol. 4, No. 2, 1034-1055.
doi: 10.1214/09-AOAS245

Szekely, G. J. and Rizzo, M. L. (2004) Testing for Equal Distributions in High Dimension, InterStat, November (5).

Szekely, G. J. (2000) Technical Report 03-05, E-statistics: Energy of Statistical Samples, Department of Mathematics and Statistics, Bowling Green State University.

`energy.hclust` `eqdist.etest` `ksample.e` `disco`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17``` ``` ## compute cluster e-distances for 3 samples of iris data data(iris) edist(iris[,1:4], c(50,50,50)) ## pairwise disco statistics edist(iris[,1:4], c(50,50,50), method="discoB") ## compute e-distances from a distance object data(iris) edist(dist(iris[,1:4]), c(50, 50, 50), distance=TRUE, alpha = 1) ## compute e-distances from a distance matrix data(iris) d <- as.matrix(dist(iris[,1:4])) edist(d, c(50, 50, 50), distance=TRUE, alpha = 1) ```