| dcov2d | R Documentation |
For bivariate data only, these are fast O(n log n) implementations of distance correlation and distance covariance statistics. The U-statistic for dcov^2 is unbiased; the V-statistic is the original definition in SRB 2007. These algorithms do not store the distance matrices, so they are suitable for large samples.
dcor2d(x, y, type = c("V", "U"))
dcov2d(x, y, type = c("V", "U"), all.stats = FALSE)
x |
numeric vector |
y |
numeric vector |
type |
"V" or "U", for V- or U-statistics |
all.stats |
logical |
The unbiased (squared) dcov is documented in dcovU, for multivariate data in arbitrary, not necessarily equal dimensions. dcov2d and dcor2d provide a faster O(n log n) algorithm for bivariate (x, y) only (X and Y are real-valued random vectors). The O(n log n) algorithm was proposed by Huo and Szekely (2016). The algorithm is faster above a certain sample size n. It does not store the distance matrix so the sample size can be very large.
By default, dcov2d returns the V-statistic V_n = dCov_n^2(x, y), and if type="U", it returns the U-statistic, unbiased for dCov^2(X, Y). The argument all.stats=TRUE is used internally when the function is called from dcor2d.
By default, dcor2d returns dCor_n^2(x, y), and if type="U", it returns a bias-corrected estimator of squared dcor equivalent to bcdcor.
These functions do not store the distance matrices so they are helpful when sample size is large and the data is bivariate.
The U-statistic U_n can be negative in the lower tail so
the square root of the U-statistic is not applied.
Similarly, dcor2d(x, y, "U") is bias-corrected and can be
negative in the lower tail, so we do not take the
square root. The original definitions of dCov and dCor
(SRB2007, SR2009) were based on V-statistics, which are non-negative,
and defined using the square root of V-statistics.
It has been suggested that instead of taking the square root of the U-statistic, one could take the root of |U_n| before applying the sign, but that introduces more bias than the original dCor, and should never be used.
Maria L. Rizzo mrizzo@bgsu.edu and Gabor J. Szekely
Huo, X. and Szekely, G.J. (2016). Fast computing for distance covariance. Technometrics, 58(4), 435-447.
Szekely, G.J. and Rizzo, M.L. (2014), Partial Distance Correlation with Methods for Dissimilarities. Annals of Statistics, Vol. 42 No. 6, 2382-2412.
Szekely, G.J., Rizzo, M.L., and Bakirov, N.K. (2007),
Measuring and Testing Dependence by Correlation of Distances,
Annals of Statistics, Vol. 35 No. 6, pp. 2769-2794.
\Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/009053607000000505")}
dcov dcov.test dcor dcor.test (multivariate statistics and permutation test)
## these are equivalent, but 2d is faster for n > 50
n <- 100
x <- rnorm(100)
y <- rnorm(100)
all.equal(dcov(x, y)^2, dcov2d(x, y), check.attributes = FALSE)
all.equal(bcdcor(x, y), dcor2d(x, y, "U"), check.attributes = FALSE)
x <- rlnorm(400)
y <- rexp(400)
dcov.test(x, y, R=199) #permutation test
dcor.test(x, y, R=199)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.