spearman_rho: Pairwise Spearman's rank correlation
In matrixCorr: Collection of Correlation and Association Estimators

spearman_rho

R Documentation

Pairwise Spearman's rank correlation

Description

Computes pairwise Spearman's rank correlations for the numeric columns of a matrix or data frame using a high-performance 'C++' backend. Optional confidence intervals are available via a jackknife Euclidean-likelihood method.

Usage

spearman_rho(
  data,
  na_method = c("error", "pairwise"),
  ci = FALSE,
  conf_level = 0.95,
  n_threads = getOption("matrixCorr.threads", 1L),
  output = c("matrix", "sparse", "edge_list"),
  threshold = 0,
  diag = TRUE,
  ...
)

## S3 method for class 'spearman_rho'
print(
  x,
  digits = 4,
  n = NULL,
  topn = NULL,
  max_vars = NULL,
  width = NULL,
  ci_digits = 3,
  show_ci = NULL,
  ...
)

## S3 method for class 'spearman_rho'
plot(
  x,
  title = "Spearman's rank correlation heatmap",
  low_color = "indianred1",
  high_color = "steelblue1",
  mid_color = "white",
  value_text_size = 4,
  ci_text_size = 3,
  show_value = TRUE,
  ...
)

## S3 method for class 'spearman_rho'
summary(
  object,
  n = NULL,
  topn = NULL,
  max_vars = NULL,
  width = NULL,
  ci_digits = 3,
  show_ci = NULL,
  ...
)

## S3 method for class 'summary.spearman_rho'
print(
  x,
  digits = NULL,
  n = NULL,
  topn = NULL,
  max_vars = NULL,
  width = NULL,
  show_ci = NULL,
  ...
)

Arguments

`data`	A numeric matrix or a data frame with at least two numeric columns. All non-numeric columns will be excluded. Each column must have at least two non-missing values.
`na_method`	Character scalar controlling missing-data handling. `"error"` rejects missing, `NaN`, and infinite values. `"pairwise"` recomputes each correlation on its own pairwise complete-case overlap.
`ci`	Logical (default `FALSE`). If `TRUE`, attach jackknife Euclidean-likelihood confidence intervals for the off-diagonal Spearman correlations.
`conf_level`	Confidence level used when `ci = TRUE`. Default is `0.95`.
`n_threads`	Integer `\geq 1`. Number of OpenMP threads. Defaults to `getOption("matrixCorr.threads", 1L)`.
`output`	Output representation for the computed estimates. `"matrix"` (default): full dense matrix; best when you need matrix algebra, dense heatmaps, or full compatibility with existing code. `"sparse"`: sparse matrix from Matrix containing only retained entries; best when many values are dropped by thresholding. `"edge_list"`: long-form data frame with columns `row`, `col`, `value`; convenient for filtering, joins, and network-style workflows.
`threshold`	Non-negative absolute-value filter for non-matrix outputs: keep entries with `abs(value) >= threshold`. Use `threshold > 0` when you want only stronger associations (typically with `output = "sparse"` or `"edge_list"`). Keep `threshold = 0` to retain all values. Must be `0` when `output = "matrix"`.
`diag`	Logical; whether to include diagonal entries in `"sparse"` and `"edge_list"` outputs.
`...`	Additional arguments passed to `ggplot2::theme()` or other `ggplot2` layers.
`x`	An object of class `summary.spearman_rho`.
`digits`	Integer; number of decimal places to print.
`n`	Optional row threshold for compact preview output.
`topn`	Optional number of leading/trailing rows to show when truncated.
`max_vars`	Optional maximum number of visible columns; `NULL` derives this from console width.
`width`	Optional display width; defaults to `getOption("width")`.
`ci_digits`	Integer; digits for Spearman confidence limits in the pairwise summary.
`show_ci`	One of `"yes"` or `"no"`.
`title`	Plot title. Default is `"Spearman's rank correlation heatmap"`.
`low_color`	Color for the minimum rho value. Default is `"indianred1"`.
`high_color`	Color for the maximum rho value. Default is `"steelblue1"`.
`mid_color`	Color for zero correlation. Default is `"white"`.
`value_text_size`	Font size for displaying correlation values. Default is `4`.
`ci_text_size`	Text size for confidence intervals in the heatmap.
`show_value`	Logical; if `TRUE` (default), overlay numeric values on the heatmap tiles.
`object`	An object of class `spearman_rho`.

Details

For each column j=1,\ldots,p, let R_{\cdot j} \in \{1,\ldots,n\}^n denote the (mid-)ranks of X_{\cdot j}, assigning average ranks to ties. The mean rank is \bar R_j = (n+1)/2 regardless of ties. Define the centred rank vectors \tilde R_{\cdot j} = R_{\cdot j} - \bar R_j \mathbf{1}, where \mathbf{1}\in\mathbb{R}^n is the all-ones vector. The Spearman correlation between columns i and j is the Pearson correlation of their rank vectors:

\rho_S(i,j) \;=\; \frac{\sum_{k=1}^n (R_{ki}-\bar R_i)(R_{kj}-\bar R_j)} {\sqrt{\sum_{k=1}^n (R_{ki}-\bar R_i)^2}\; \sqrt{\sum_{k=1}^n (R_{kj}-\bar R_j)^2}}.

In matrix form, with R=[R_{\cdot 1},\ldots,R_{\cdot p}], \mu=(n+1)\mathbf{1}_p/2 for \mathbf{1}_p\in\mathbb{R}^p, and S_R=\bigl(R-\mathbf{1}\mu^\top\bigr)^\top \bigl(R-\mathbf{1}\mu^\top\bigr)/(n-1), the Spearman correlation matrix is

\widehat{\rho}_S \;=\; D^{-1/2} S_R D^{-1/2}, \qquad D \;=\; \mathrm{diag}(\mathrm{diag}(S_R)).

When there are no ties, the familiar rank-difference formula obtains

\rho_S(i,j) \;=\; 1 - \frac{6}{n(n^2-1)} \sum_{k=1}^n d_k^2, \quad d_k \;=\; R_{ki}-R_{kj},

but this expression does not hold under ties; computing Pearson on mid-ranks (as above) is the standard tie-robust approach. Without ties, \mathrm{Var}(R_{\cdot j})=(n^2-1)/12; with ties, the variance is smaller.

\rho_S(i,j) \in [-1,1] and \widehat{\rho}_S is symmetric positive semi-definite by construction (up to floating-point error). The implementation symmetrises the result to remove round-off asymmetry. Spearman's correlation is invariant to strictly monotone transformations applied separately to each variable.

Computation. Each column is ranked (mid-ranks) to form R. The product R^\top R is computed via a 'BLAS' symmetric rank update ('SYRK'), and centred using

(R-\mathbf{1}\mu^\top)^\top (R-\mathbf{1}\mu^\top) \;=\; R^\top R \;-\; n\,\mu\mu^\top,

avoiding an explicit centred copy. Division by n-1 yields the sample covariance of ranks; standardising by D^{-1/2} gives \widehat{\rho}_S. Columns with zero rank variance (all values equal) are returned as NA along their row/column; the corresponding diagonal entry is also NA.

When na_method = "pairwise", each (i,j) estimate is recomputed on the pairwise complete-case overlap of columns i and j. When ci = TRUE, confidence intervals are computed in 'C++' using the jackknife Euclidean-likelihood method of de Carvalho and Marques (2012). For a pairwise estimate U = \hat\rho_S, delete-one jackknife pseudo-values are formed as

Z_i = nU - (n-1)U_{(-i)}, \qquad i = 1,\ldots,n,

where U_{(-i)} is the Spearman correlation after removing observation i. The confidence limits solve

\frac{n(U-\theta)^2}{n^{-1}\sum_{i=1}^n (Z_i - \theta)^2} = \chi^2_{1,\;\texttt{conf\_level}}.

Ranking costs O\!\bigl(p\,n\log n\bigr); forming and normalising R^\top R costs O\!\bigl(n p^2\bigr) with O(p^2) additional memory. The optional jackknife Euclidean-likelihood confidence intervals add per-pair delete-one recomputation work and are intended for inference rather than raw-matrix throughput.

Value

A symmetric numeric matrix where the (i, j)-th element is the Spearman correlation between the i-th and j-th numeric columns of the input. When ci = TRUE, the object also carries a ci attribute with elements est, lwr.ci, upr.ci, and conf.level. When pairwise-complete evaluation is used, pairwise sample sizes are stored in attr(x, "diagnostics")$n_complete.

Invisibly returns the spearman_rho object.

A ggplot object representing the heatmap.

Note

Missing values are rejected when na_method = "error". Columns with fewer than two usable observations are excluded.

Author(s)

Thiago de Paula Oliveira

References

Spearman, C. (1904). The proof and measurement of association between two things. International Journal of Epidemiology, 39(5), 1137-1150.

de Carvalho, M., & Marques, F. (2012). Jackknife Euclidean likelihood-based inference for Spearman's rho. North American Actuarial Journal, 16(4), 487-492.

Examples

## Monotone transformation invariance (Spearman is rank-based)
set.seed(123)
n <- 400; p <- 6; rho <- 0.6
Sigma <- rho^abs(outer(seq_len(p), seq_len(p), "-"))
L <- chol(Sigma)
X <- matrix(rnorm(n * p), n, p) %*% L
colnames(X) <- paste0("V", seq_len(p))

X_mono <- X
X_mono[, 1] <- exp(X_mono[, 1])
X_mono[, 2] <- log1p(exp(X_mono[, 2]))
X_mono[, 3] <- X_mono[, 3]^3

sp_X <- spearman_rho(X)
sp_m <- spearman_rho(X_mono)
summary(sp_X)
round(max(abs(sp_X - sp_m)), 3)
plot(sp_X)

## Confidence intervals
sp_ci <- spearman_rho(X[, 1:3], ci = TRUE)
print(sp_ci, show_ci = "yes")
summary(sp_ci)

## Ties handled via mid-ranks
tied <- cbind(
  a = rep(1:5, each = 20),
  b = rep(5:1, each = 20) + rnorm(100, sd = 0.1),
  c = as.numeric(gl(10, 10))
)
sp_tied <- spearman_rho(tied, ci = TRUE)
print(sp_tied, digits = 2, show_ci = "yes")

# Interactive viewing (requires shiny)
if (interactive() && requireNamespace("shiny", quietly = TRUE)) {
  view_corr_shiny(sp_X)
}

matrixCorr documentation built on April 18, 2026, 5:06 p.m.