For bivariate data only, these are fast O(n log n) implementations of distance correlation and distance covariance statistics. The U-statistic for dcov^2 is unbiased; the V-statistic is the original definition in SRB 2007. These algorithms do not store the distance matrices, so they are suitable for large samples.
"V" or "U", for V- or U-statistics
The unbiased (squared) dcov is documented in
dcovU, for multivariate data in arbitrary, not necessarily equal dimensions.
dcor2d provide a faster O(n log n) algorithm for bivariate (x, y) only (X and Y are real-valued random vectors). The O(n log n) algorithm was proposed by Huo and Szekely (2016). The algorithm is faster above a certain sample size n. It does not store the distance matrix so the sample size can be very large.
dcov2d returns the V-statistic V_n = dCov_n^2(x, y), and if type="U", it returns the U-statistic, unbiased for dCov^2(X,Y). The argument all.stats=TRUE is used internally when the function is called from
dcor2d returns dCor_n^2(x, y), and if type="U", it returns a bias-corrected estimator of squared dcor equivalent to
These functions do not store the distance matrices so they are helpful when sample size is large and the data is bivariate.
The U-statistic U_n can be negative in the lower tail so
the square root of the U-statistic is not applied.
dcor2d(x, y, "U") is bias-corrected and can be
negative in the lower tail, so we do not take the
square root. The original definitions of dCov and dCor
(SRB2007, SR2009) were based on V-statistics, which are non-negative,
and defined using the square root of V-statistics.
It has been suggested that instead of taking the square root of the U-statistic, one could take the root of |U_n| before applying the sign, but that introduces more bias than the original dCor, and should never be used.
Maria L. Rizzo mrizzo @ bgsu.edu and Gabor J. Szekely
Huo, X. and Szekely, G.J. (2016). Fast computing for distance covariance. Technometrics, 58(4), 435-447.
Szekely, G.J. and Rizzo, M.L. (2014), Partial Distance Correlation with Methods for Dissimilarities. Annals of Statistics, Vol. 42 No. 6, 2382-2412.
Szekely, G.J., Rizzo, M.L., and Bakirov, N.K. (2007),
Measuring and Testing Dependence by Correlation of Distances,
Annals of Statistics, Vol. 35 No. 6, pp. 2769-2794.
1 2 3 4 5 6 7 8 9 10 11 12 13
## these are equivalent, but 2d is faster for n > 50 n <- 100 x <- rnorm(100) y <- rnorm(100) all.equal(dcov(x, y)^2, dcov2d(x, y), check.attributes = FALSE) all.equal(bcdcor(x, y), dcor2d(x, y, "U"), check.attributes = FALSE) x <- rlnorm(400) y <- rexp(400) dcov.test(x, y, R=199) #permutation test dcor.test(x, y, R=199)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.