View source: R/distance_corr.R
distance_corr | R Documentation |
Computes all pairwise distance correlations using the unbiased U-statistic estimator for the numeric columns of a matrix or data frame, via a high-performance 'C++' backend ('OpenMP'-parallelised). Distance correlation detects general (including non-linear and non-monotonic) dependence between variables; unlike Pearson or Spearman, it is zero (in population) if and only if the variables are independent.
Prints a summary of the distance correlation matrix with optional truncation for large objects.
Generates a ggplot2 heatmap of the distance correlation matrix.
Distance correlation is non-negative; the fill scale spans [0, 1]
.
distance_corr(data)
## S3 method for class 'distance_corr'
print(x, digits = 4, max_rows = NULL, max_cols = NULL, ...)
## S3 method for class 'distance_corr'
plot(
x,
title = "Distance correlation heatmap",
low_color = "white",
high_color = "steelblue1",
value_text_size = 4,
...
)
data |
A numeric matrix or a data frame with at least two numeric
columns. All non-numeric columns are dropped. Columns must be numeric
and contain no |
x |
An object of class |
digits |
Integer; number of decimal places to print. |
max_rows |
Optional integer; maximum number of rows to display.
If |
max_cols |
Optional integer; maximum number of columns to display.
If |
... |
Additional arguments passed to |
title |
Plot title. Default is |
low_color |
Colour for zero correlation. Default is |
high_color |
Colour for strong correlation. Default is |
value_text_size |
Font size for displaying values. Default is |
Let x \in \mathbb{R}^n
and D^{(x)}
be the pairwise distance matrix
with zero diagonal: D^{(x)}_{ii} = 0
, D^{(x)}_{ij} = |x_i - x_j|
for
i \neq j
. Define row sums r^{(x)}_i = \sum_{k \neq i} D^{(x)}_{ik}
and
grand sum S^{(x)} = \sum_{i \neq k} D^{(x)}_{ik}
. The U-centred matrix is
A^{(x)}_{ij} =
\begin{cases}
D^{(x)}_{ij} - \dfrac{r^{(x)}_i + r^{(x)}_j}{n - 2}
+ \dfrac{S^{(x)}}{(n - 1)(n - 2)}, & i \neq j,\\[6pt]
0, & i = j~.
\end{cases}
For two variables x,y
, the unbiased distance covariance and variances are
\widehat{\mathrm{dCov}}^2_u(x,y) = \frac{2}{n(n-3)} \sum_{i<j} A^{(x)}_{ij} A^{(y)}_{ij}
\;=\; \frac{1}{n(n-3)} \sum_{i \neq j} A^{(x)}_{ij} A^{(y)}_{ij},
with \widehat{\mathrm{dVar}}^2_u(x)
defined analogously from A^{(x)}
.
The unbiased distance correlation is
\widehat{\mathrm{dCor}}_u(x,y) =
\frac{\widehat{\mathrm{dCov}}_u(x,y)}
{\sqrt{\widehat{\mathrm{dVar}}_u(x)\,\widehat{\mathrm{dVar}}_u(y)}} \in [0,1].
A symmetric numeric matrix where the (i, j)
entry is the
unbiased distance correlation between the i
-th and j
-th
numeric columns. The object has class distance_corr
with attributes
method = "distance_correlation"
, description
, and
package = "matrixCorr"
.
Invisibly returns x
.
A ggplot
object representing the heatmap.
Requires n \ge 4
. Columns with (near) zero unbiased distance
variance yield NA
in their row/column. Computation is O(n^2)
per
pair.
Székely, G. J., Rizzo, M. L., & Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. Annals of Statistics, 35(6), 2769–2794.
Székely, G. J., & Rizzo, M. L. (2013). The distance correlation t-test of independence. Journal of Multivariate Analysis, 117, 193-213.
##Independent variables -> dCor ~ 0
set.seed(1)
X <- cbind(a = rnorm(200), b = rnorm(200))
D <- distance_corr(X)
print(D, digits = 3)
## Non-linear dependence: Pearson ~ 0, but unbiased dCor > 0
set.seed(42)
n <- 200
x <- rnorm(n)
y <- x^2 + rnorm(n, sd = 0.2)
XY <- cbind(x = x, y = y)
D2 <- distance_corr(XY)
# Compare Pearson vs unbiased distance correlation
round(c(pearson = cor(XY)[1, 2], dcor = D2["x", "y"]), 3)
plot(D2, title = "Unbiased distance correlation (non-linear example)")
## Small AR(1) multivariate normal example
set.seed(7)
p <- 5; n <- 150; rho <- 0.6
Sigma <- rho^abs(outer(seq_len(p), seq_len(p), "-"))
X3 <- MASS::mvrnorm(n, mu = rep(0, p), Sigma = Sigma)
colnames(X3) <- paste0("V", seq_len(p))
D3 <- distance_corr(X3)
print(D3[1:3, 1:3], digits = 2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.