# disco: distance components (DISCO) In energy: E-Statistics: Multivariate Inference via the Energy of Data

## Description

E-statistics DIStance COmponents and tests, analogous to variance components and anova.

## Usage

 ```1 2``` ```disco(x, factors, distance, index=1.0, R, method=c("disco","discoB","discoF")) disco.between(x, factors, distance, index=1.0, R) ```

## Arguments

 `x` data matrix or distance matrix or dist object `factors` matrix of factor labels or integers (not design matrix) `distance` logical, TRUE if x is distance matrix `index` exponent on Euclidean distance in (0,2] `R` number of replicates for a permutation test `method` test statistic

## Details

`disco` calculates the distance components decomposition of total dispersion and if R > 0 tests for significance using the test statistic disco "F" ratio (default `method="disco"`), or using the between component statistic (`method="discoB"`), each implemented by permutation test.

If `x` is a `dist` object, argument `distance` is ignored. If `x` is a distance matrix, set `distance=TRUE`.

In the current release `disco` computes the decomposition for one-way models only.

## Value

When `method="discoF"`, `disco` returns a list similar to the return value from `anova.lm`, and the `print.disco` method is provided to format the output into a similar table. Details:

`disco` returns a class `disco` object, which is a list containing

 `call` call `method` method `statistic` vector of observed statistics `p.value` vector of p-values `k` number of factors `N` number of observations `between` between-sample distance components `withins` one-way within-sample distance components `within` within-sample distance component `total` total dispersion `Df.trt` degrees of freedom for treatments `Df.e` degrees of freedom for error `index` index (exponent on distance) `factor.names` factor names `factor.levels` factor levels `sample.sizes` sample sizes `stats` matrix containing decomposition

When `method="discoB"`, `disco` passes the arguments to `disco.between`, which returns a class `htest` object.

`disco.between` returns a class `htest` object, where the test statistic is the between-sample statistic (proportional to the numerator of the F ratio of the `disco` test.

## Note

The current version does all calculations via matrix arithmetic and boot function. Support for more general additive models and a formula interface is under development.

`disco` methods have been added to the cluster distance summary function `edist`, and energy tests for equality of distribution (see `eqdist.etest`).

## Author(s)

Maria L. Rizzo mrizzo @ bgsu.edu and Gabor J. Szekely

## References

M. L. Rizzo and G. J. Szekely (2010). DISCO Analysis: A Nonparametric Extension of Analysis of Variance, Annals of Applied Statistics, Vol. 4, No. 2, 1034-1055.
doi: 10.1214/09-AOAS245

` edist ` ` eqdist.e ` ` eqdist.etest ` ` ksample.e `

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34``` ``` ## warpbreaks one-way decompositions data(warpbreaks) attach(warpbreaks) disco(breaks, factors=wool, R=99) ## When index=2 for univariate data, we get ANOVA decomposition disco(breaks, factors=tension, index=2.0, R=99) aov(breaks ~ tension) ## Multivariate response ## Example on producing plastic film from Krzanowski (1998, p. 381) tear <- c(6.5, 6.2, 5.8, 6.5, 6.5, 6.9, 7.2, 6.9, 6.1, 6.3, 6.7, 6.6, 7.2, 7.1, 6.8, 7.1, 7.0, 7.2, 7.5, 7.6) gloss <- c(9.5, 9.9, 9.6, 9.6, 9.2, 9.1, 10.0, 9.9, 9.5, 9.4, 9.1, 9.3, 8.3, 8.4, 8.5, 9.2, 8.8, 9.7, 10.1, 9.2) opacity <- c(4.4, 6.4, 3.0, 4.1, 0.8, 5.7, 2.0, 3.9, 1.9, 5.7, 2.8, 4.1, 3.8, 1.6, 3.4, 8.4, 5.2, 6.9, 2.7, 1.9) Y <- cbind(tear, gloss, opacity) rate <- factor(gl(2,10), labels=c("Low", "High")) ## test for equal distributions by rate disco(Y, factors=rate, R=99) disco(Y, factors=rate, R=99, method="discoB") ## Just extract the decomposition table disco(Y, factors=rate, R=0)\$stats ## Compare eqdist.e methods for rate ## disco between stat is half of original when sample sizes equal eqdist.e(Y, sizes=c(10, 10), method="original") eqdist.e(Y, sizes=c(10, 10), method="discoB") ## The between-sample distance component disco.between(Y, factors=rate, R=0) ```

### Example output

```disco(x = breaks, factors = wool, R = 99)

Distance Components: index  1.00
Source                 Df   Sum Dist  Mean Dist    F-ratio    p-value
factors                 1   10.77778   10.77778      1.542       0.25
Within                 52  363.55556    6.99145
Total                  53  374.33333
disco(x = breaks, factors = tension, index = 2, R = 99)

Distance Components: index  2.00
Source                 Df   Sum Dist  Mean Dist    F-ratio    p-value
factors                 2 2034.25926 1017.12963      7.206       0.01
Within                 51 7198.55556  141.14815
Total                  53 9232.81481
Call:
aov(formula = breaks ~ tension)

Terms:
tension Residuals
Sum of Squares  2034.259  7198.556
Deg. of Freedom        2        51

Residual standard error: 11.88058
Estimated effects may be unbalanced
disco(x = Y, factors = rate, R = 99)

Distance Components: index  1.00
Source                 Df   Sum Dist  Mean Dist    F-ratio    p-value
factors                 1    1.27003    1.27003      0.981       0.35
Within                 18   23.30105    1.29450
Total                  19   24.57108

DISCO (Between-sample)

data:  x
DISCO between statistic = 1.27, p-value = 0.3232

Trt   Within df1 df2      Stat p-value
[1,] 1.270028 23.30105   1  18 0.9810934      NA
E-statistic
2.540056
 1.270028
 1.270028
```

energy documentation built on Feb. 22, 2021, 5:08 p.m.