dcor: Pairwise Distance Correlation (dCor)
In matrixCorr: Collection of Correlation and Association Estimators

View source: R/dcor.R

dcor	R Documentation

Pairwise Distance Correlation (dCor)

Description

Computes pairwise distance correlations for the numeric columns of a matrix or data frame using a high-performance 'C++' backend. Distance correlation detects general dependence, including non-linear relationships. Optional p-values are available via the bias-corrected distance-correlation t-test.

Usage

dcor(
  data,
  na_method = c("error", "pairwise"),
  p_value = FALSE,
  n_threads = getOption("matrixCorr.threads", 1L),
  output = c("matrix", "sparse", "edge_list"),
  threshold = 0,
  diag = TRUE,
  ...
)

## S3 method for class 'dcor'
print(
  x,
  digits = 4,
  n = NULL,
  topn = NULL,
  max_vars = NULL,
  width = NULL,
  show_ci = NULL,
  ...
)

## S3 method for class 'dcor'
plot(
  x,
  title = "Distance correlation heatmap",
  low_color = "white",
  high_color = "steelblue1",
  value_text_size = 4,
  show_value = TRUE,
  ...
)

## S3 method for class 'dcor'
summary(
  object,
  n = NULL,
  topn = NULL,
  max_vars = NULL,
  width = NULL,
  show_ci = NULL,
  ...
)

## S3 method for class 'summary.dcor'
print(
  x,
  digits = NULL,
  n = NULL,
  topn = NULL,
  max_vars = NULL,
  width = NULL,
  show_ci = NULL,
  ...
)

Arguments

`data`	A numeric matrix or a data frame with at least two numeric columns. All non-numeric columns are dropped. Columns must be numeric.
`na_method`	Character scalar controlling missing-data handling. `"error"` rejects missing, `NaN`, and infinite values. `"pairwise"` recomputes each association on its own pairwise complete-case overlap.
`p_value`	Logical (default `FALSE`). If `TRUE`, attach pairwise p-values, test statistics, and degrees of freedom from the distance-correlation t-test of independence.
`n_threads`	Integer `\geq 1`. Number of OpenMP threads. Defaults to `getOption("matrixCorr.threads", 1L)`.
`output`	Output representation for the computed estimates. `"matrix"` (default): full dense matrix; best when you need matrix algebra, dense heatmaps, or full compatibility with existing code. `"sparse"`: sparse matrix from Matrix containing only retained entries; best when many values are dropped by thresholding. `"edge_list"`: long-form data frame with columns `row`, `col`, `value`; convenient for filtering, joins, and network-style workflows.
`threshold`	Non-negative absolute-value filter for non-matrix outputs: keep entries with `abs(value) >= threshold`. Use `threshold > 0` when you want only stronger associations (typically with `output = "sparse"` or `"edge_list"`). Keep `threshold = 0` to retain all values. Must be `0` when `output = "matrix"`.
`diag`	Logical; whether to include diagonal entries in `"sparse"` and `"edge_list"` outputs.
`...`	Additional arguments passed to `ggplot2::theme()` or other `ggplot2` layers.
`x`	An object of class `summary.dcor`.
`digits`	Integer; number of decimal places to print.
`n`	Optional row threshold for compact preview output.
`topn`	Optional number of leading/trailing rows to show when truncated.
`max_vars`	Optional maximum number of visible columns; `NULL` derives this from console width.
`width`	Optional display width; defaults to `getOption("width")`.
`show_ci`	One of `"yes"` or `"no"`.
`title`	Plot title. Default is `"Distance correlation heatmap"`.
`low_color`	Colour for zero correlation. Default is `"white"`.
`high_color`	Colour for strong correlation. Default is `"steelblue1"`.
`value_text_size`	Font size for displaying values. Default is `4`.
`show_value`	Logical; if `TRUE` (default), overlay numeric values on the heatmap tiles.
`object`	An object of class `dcor`.

Details

Let x \in \mathbb{R}^n and D^{(x)} be the pairwise distance matrix with zero diagonal: D^{(x)}_{ii} = 0, D^{(x)}_{ij} = |x_i - x_j| for i \neq j. Define row sums r^{(x)}_i = \sum_{k \neq i} D^{(x)}_{ik} and grand sum S^{(x)} = \sum_{i \neq k} D^{(x)}_{ik}. The U-centred matrix is

A^{(x)}_{ij} = \begin{cases} D^{(x)}_{ij} - \dfrac{r^{(x)}_i + r^{(x)}_j}{n - 2} + \dfrac{S^{(x)}}{(n - 1)(n - 2)}, & i \neq j,\\[6pt] 0, & i = j~. \end{cases}

For two variables x,y, the unbiased distance covariance and variances are

\widehat{\mathrm{dCov}}^2_u(x,y) = \frac{2}{n(n-3)} \sum_{i<j} A^{(x)}_{ij} A^{(y)}_{ij} \;=\; \frac{1}{n(n-3)} \sum_{i \neq j} A^{(x)}_{ij} A^{(y)}_{ij},

with \widehat{\mathrm{dVar}}^2_u(x) defined analogously from A^{(x)}. The unbiased distance correlation is

\widehat{\mathrm{dCor}}_u(x,y) = \frac{\widehat{\mathrm{dCov}}_u(x,y)} {\sqrt{\widehat{\mathrm{dVar}}_u(x)\,\widehat{\mathrm{dVar}}_u(y)}} \in [0,1].

Computation. All heavy lifting (distance matrices, U-centering, and unbiased scaling) is implemented in C++ (ustat_dcor_matrix_cpp), so the R wrapper only validates/coerces the input. OpenMP parallelises the upper-triangular loops. The implementation includes a Huo-Szekely style univariate O(n \log n) dispatch for pairwise terms. We also have an exact unbiased O(n^2) fallback retained for robustness in small-sample or non-finite-path cases; no external dependencies are used.

Inference. When p_value = TRUE, the package computes the bias-corrected distance-correlation t-test of independence of Szekely and Rizzo (2013). Let \widehat{\mathrm{dCor}}^\ast(x,y) denote the signed bias-corrected distance correlation used internally by the test (that is, the same ratio before the package's usual clipping to [0,1]). With

M = \frac{n(n-3)}{2},

the test statistic is

T = \sqrt{M - 1}\; \frac{\widehat{\mathrm{dCor}}^\ast(x,y)} {\sqrt{1 - \{\widehat{\mathrm{dCor}}^\ast(x,y)\}^2}},

referenced to a Student t-distribution with M - 1 degrees of freedom. The reported p-value uses the upper-tail probability P(t_{M-1} \ge T). This inference payload is attached as metadata; the main returned matrix is unchanged unless p_value is explicitly requested.

Value

A symmetric numeric matrix where the (i, j) entry is the unbiased distance correlation between the i-th and j-th numeric columns. The object has class dcor with attributes method = "distance_correlation", description, and package = "matrixCorr". When p_value = TRUE, the object also carries an inference attribute with matrices estimate, statistic, parameter, and p_value, plus attr(x, "diagnostics")$n_complete. The main returned matrix remains the usual non-negative unbiased distance-correlation estimate.

Invisibly returns x.

A ggplot object representing the heatmap.

Note

Requires n \ge 4. Columns with (near) zero unbiased distance variance yield NA in their row/column. Typical per-pair cost uses the O(n \log n) fast path, with O(n^2) fallback when needed.

Author(s)

Thiago de paula Oliveira

References

Szekely, G. J., Rizzo, M. L., & Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. Annals of Statistics, 35(6), 2769-2794.

Szekely, G. J., & Rizzo, M. L. (2013). The distance correlation t-test of independence. Journal of Multivariate Analysis, 117, 193-213.

Rizzo, M. L., & Szekely, G. J. (2024). energy: E-statistics (energy statistics). R package version 1.7-12.

Examples

## Independent variables -> dCor ~ 0
set.seed(1)
X <- cbind(a = rnorm(200), b = rnorm(200))
D <- dcor(X)
print(D, digits = 3)
summary(D)

## Non-linear dependence: Pearson ~ 0, but unbiased dCor > 0
set.seed(42)
n <- 200
x <- rnorm(n)
y <- x^2 + rnorm(n, sd = 0.2)
XY <- cbind(x = x, y = y)
D2 <- dcor(XY)
# Compare Pearson vs unbiased distance correlation
round(c(pearson = cor(XY)[1, 2], dcor = D2["x", "y"]), 3)
summary(D2)
plot(D2, title = "Unbiased distance correlation (non-linear example)")

## Small AR(1) multivariate normal example
set.seed(7)
p <- 5; n <- 150; rho <- 0.6
Sigma <- rho^abs(outer(seq_len(p), seq_len(p), "-"))
X3 <- MASS::mvrnorm(n, mu = rep(0, p), Sigma = Sigma)
colnames(X3) <- paste0("V", seq_len(p))
D3 <- dcor(X3)
print(D3[1:3, 1:3], digits = 2)

## Optional inference
D4 <- dcor(XY, p_value = TRUE)
summary(D4)

# Interactive viewing (requires shiny)
if (interactive() && requireNamespace("shiny", quietly = TRUE)) {
  view_corr_shiny(D)
}

matrixCorr documentation built on April 18, 2026, 5:06 p.m.