# dcov2d: Fast dCor and dCov for bivariate data only In energy: E-Statistics: Multivariate Inference via the Energy of Data

## Description

For bivariate data only, these are fast O(n log n) implementations of distance correlation and distance covariance statistics. The U-statistic for dcov^2 is unbiased; the V-statistic is the original definition in SRB 2007. These algorithms do not store the distance matrices, so they are suitable for large samples.

## Usage

 ```1 2``` ```dcor2d(x, y, type = c("V", "U")) dcov2d(x, y, type = c("V", "U"), all.stats = FALSE) ```

## Arguments

 `x` numeric vector `y` numeric vector `type` "V" or "U", for V- or U-statistics `all.stats` logical

## Details

The unbiased (squared) dcov is documented in `dcovU`, for multivariate data in arbitrary, not necessarily equal dimensions. `dcov2d` and `dcor2d` provide a faster O(n log n) algorithm for bivariate (x, y) only (X and Y are real-valued random vectors). The O(n log n) algorithm was proposed by Huo and Szekely (2016). The algorithm is faster above a certain sample size n. It does not store the distance matrix so the sample size can be very large.

## Value

By default, `dcov2d` returns the V-statistic V_n = dCov_n^2(x, y), and if type="U", it returns the U-statistic, unbiased for dCov^2(X,Y). The argument all.stats=TRUE is used internally when the function is called from `dcor2d`.

By default, `dcor2d` returns dCor_n^2(x, y), and if type="U", it returns a bias-corrected estimator of squared dcor equivalent to `bcdcor`.

These functions do not store the distance matrices so they are helpful when sample size is large and the data is bivariate.

## Note

The U-statistic U_n can be negative in the lower tail so the square root of the U-statistic is not applied. Similarly, `dcor2d(x, y, "U")` is bias-corrected and can be negative in the lower tail, so we do not take the square root. The original definitions of dCov and dCor (SRB2007, SR2009) were based on V-statistics, which are non-negative, and defined using the square root of V-statistics.

It has been suggested that instead of taking the square root of the U-statistic, one could take the root of |U_n| before applying the sign, but that introduces more bias than the original dCor, and should never be used.

## Author(s)

Maria L. Rizzo mrizzo @ bgsu.edu and Gabor J. Szekely

## References

Huo, X. and Szekely, G.J. (2016). Fast computing for distance covariance. Technometrics, 58(4), 435-447.

Szekely, G.J. and Rizzo, M.L. (2014), Partial Distance Correlation with Methods for Dissimilarities. Annals of Statistics, Vol. 42 No. 6, 2382-2412.

Szekely, G.J., Rizzo, M.L., and Bakirov, N.K. (2007), Measuring and Testing Dependence by Correlation of Distances, Annals of Statistics, Vol. 35 No. 6, pp. 2769-2794.
doi: 10.1214/009053607000000505

## See Also

`dcov` `dcov.test` `dcor` `dcor.test` (multivariate statistics and permutation test)

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13``` ``` ## these are equivalent, but 2d is faster for n > 50 n <- 100 x <- rnorm(100) y <- rnorm(100) all.equal(dcov(x, y)^2, dcov2d(x, y), check.attributes = FALSE) all.equal(bcdcor(x, y), dcor2d(x, y, "U"), check.attributes = FALSE) x <- rlnorm(400) y <- rexp(400) dcov.test(x, y, R=199) #permutation test dcor.test(x, y, R=199) ```

### Example output

```[1] TRUE
[1] TRUE

dCov test of independence

data:  index 1, replicates 199
nV^2 = 0.90987, p-value = 0.945
sample estimates:
dCov
0.04769359

dCor test of independence

data:  index 1, replicates 199
dCor = 0.063131, p-value = 0.96
sample estimates:
[1] 0.04769359 0.06313127 0.91040327 0.62689860
```

energy documentation built on Feb. 22, 2021, 5:08 p.m.