distance_corr: Pairwise Distance Correlation (dCor)

View source: R/distance_corr.R

distance_corrR Documentation

Pairwise Distance Correlation (dCor)

Description

Computes all pairwise distance correlations using the unbiased U-statistic estimator for the numeric columns of a matrix or data frame, via a high-performance 'C++' backend ('OpenMP'-parallelised). Distance correlation detects general (including non-linear and non-monotonic) dependence between variables; unlike Pearson or Spearman, it is zero (in population) if and only if the variables are independent.

Prints a summary of the distance correlation matrix with optional truncation for large objects.

Generates a ggplot2 heatmap of the distance correlation matrix. Distance correlation is non-negative; the fill scale spans [0, 1].

Usage

distance_corr(data)

## S3 method for class 'distance_corr'
print(x, digits = 4, max_rows = NULL, max_cols = NULL, ...)

## S3 method for class 'distance_corr'
plot(
  x,
  title = "Distance correlation heatmap",
  low_color = "white",
  high_color = "steelblue1",
  value_text_size = 4,
  ...
)

Arguments

data

A numeric matrix or a data frame with at least two numeric columns. All non-numeric columns are dropped. Columns must be numeric and contain no NAs.

x

An object of class distance_corr.

digits

Integer; number of decimal places to print.

max_rows

Optional integer; maximum number of rows to display. If NULL, all rows are shown.

max_cols

Optional integer; maximum number of columns to display. If NULL, all columns are shown.

...

Additional arguments passed to ggplot2::theme() or other ggplot2 layers.

title

Plot title. Default is "Distance correlation heatmap".

low_color

Colour for zero correlation. Default is "white".

high_color

Colour for strong correlation. Default is "steelblue1".

value_text_size

Font size for displaying values. Default is 4.

Details

Let x \in \mathbb{R}^n and D^{(x)} be the pairwise distance matrix with zero diagonal: D^{(x)}_{ii} = 0, D^{(x)}_{ij} = |x_i - x_j| for i \neq j. Define row sums r^{(x)}_i = \sum_{k \neq i} D^{(x)}_{ik} and grand sum S^{(x)} = \sum_{i \neq k} D^{(x)}_{ik}. The U-centred matrix is

A^{(x)}_{ij} = \begin{cases} D^{(x)}_{ij} - \dfrac{r^{(x)}_i + r^{(x)}_j}{n - 2} + \dfrac{S^{(x)}}{(n - 1)(n - 2)}, & i \neq j,\\[6pt] 0, & i = j~. \end{cases}

For two variables x,y, the unbiased distance covariance and variances are

\widehat{\mathrm{dCov}}^2_u(x,y) = \frac{2}{n(n-3)} \sum_{i<j} A^{(x)}_{ij} A^{(y)}_{ij} \;=\; \frac{1}{n(n-3)} \sum_{i \neq j} A^{(x)}_{ij} A^{(y)}_{ij},

with \widehat{\mathrm{dVar}}^2_u(x) defined analogously from A^{(x)}. The unbiased distance correlation is

\widehat{\mathrm{dCor}}_u(x,y) = \frac{\widehat{\mathrm{dCov}}_u(x,y)} {\sqrt{\widehat{\mathrm{dVar}}_u(x)\,\widehat{\mathrm{dVar}}_u(y)}} \in [0,1].

Value

A symmetric numeric matrix where the (i, j) entry is the unbiased distance correlation between the i-th and j-th numeric columns. The object has class distance_corr with attributes method = "distance_correlation", description, and package = "matrixCorr".

Invisibly returns x.

A ggplot object representing the heatmap.

Note

Requires n \ge 4. Columns with (near) zero unbiased distance variance yield NA in their row/column. Computation is O(n^2) per pair.

References

Székely, G. J., Rizzo, M. L., & Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. Annals of Statistics, 35(6), 2769–2794.

Székely, G. J., & Rizzo, M. L. (2013). The distance correlation t-test of independence. Journal of Multivariate Analysis, 117, 193-213.

Examples

##Independent variables -> dCor ~ 0
set.seed(1)
X <- cbind(a = rnorm(200), b = rnorm(200))
D <- distance_corr(X)
print(D, digits = 3)

## Non-linear dependence: Pearson ~ 0, but unbiased dCor > 0
set.seed(42)
n <- 200
x <- rnorm(n)
y <- x^2 + rnorm(n, sd = 0.2)
XY <- cbind(x = x, y = y)
D2 <- distance_corr(XY)
# Compare Pearson vs unbiased distance correlation
round(c(pearson = cor(XY)[1, 2], dcor = D2["x", "y"]), 3)
plot(D2, title = "Unbiased distance correlation (non-linear example)")

## Small AR(1) multivariate normal example
set.seed(7)
p <- 5; n <- 150; rho <- 0.6
Sigma <- rho^abs(outer(seq_len(p), seq_len(p), "-"))
X3 <- MASS::mvrnorm(n, mu = rep(0, p), Sigma = Sigma)
colnames(X3) <- paste0("V", seq_len(p))
D3 <- distance_corr(X3)
print(D3[1:3, 1:3], digits = 2)


matrixCorr documentation built on Aug. 26, 2025, 5:07 p.m.