| spearman_rho | R Documentation |
Computes pairwise Spearman's rank correlations for the numeric columns of a matrix or data frame using a high-performance 'C++' backend. Optional confidence intervals are available via a jackknife Euclidean-likelihood method.
spearman_rho(
data,
na_method = c("error", "pairwise"),
ci = FALSE,
conf_level = 0.95,
n_threads = getOption("matrixCorr.threads", 1L),
output = c("matrix", "sparse", "edge_list"),
threshold = 0,
diag = TRUE,
...
)
## S3 method for class 'spearman_rho'
print(
x,
digits = 4,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
ci_digits = 3,
show_ci = NULL,
...
)
## S3 method for class 'spearman_rho'
plot(
x,
title = "Spearman's rank correlation heatmap",
low_color = "indianred1",
high_color = "steelblue1",
mid_color = "white",
value_text_size = 4,
ci_text_size = 3,
show_value = TRUE,
...
)
## S3 method for class 'spearman_rho'
summary(
object,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
ci_digits = 3,
show_ci = NULL,
...
)
## S3 method for class 'summary.spearman_rho'
print(
x,
digits = NULL,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
show_ci = NULL,
...
)
data |
A numeric matrix or a data frame with at least two numeric columns. All non-numeric columns will be excluded. Each column must have at least two non-missing values. |
na_method |
Character scalar controlling missing-data handling.
|
ci |
Logical (default |
conf_level |
Confidence level used when |
n_threads |
Integer |
output |
Output representation for the computed estimates.
|
threshold |
Non-negative absolute-value filter for non-matrix outputs:
keep entries with |
diag |
Logical; whether to include diagonal entries in
|
... |
Additional arguments passed to |
x |
An object of class |
digits |
Integer; number of decimal places to print. |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
ci_digits |
Integer; digits for Spearman confidence limits in the pairwise summary. |
show_ci |
One of |
title |
Plot title. Default is |
low_color |
Color for the minimum rho value. Default is
|
high_color |
Color for the maximum rho value. Default is
|
mid_color |
Color for zero correlation. Default is |
value_text_size |
Font size for displaying correlation values. Default
is |
ci_text_size |
Text size for confidence intervals in the heatmap. |
show_value |
Logical; if |
object |
An object of class |
For each column j=1,\ldots,p, let
R_{\cdot j} \in \{1,\ldots,n\}^n denote the (mid-)ranks of
X_{\cdot j}, assigning average ranks to ties. The mean rank is
\bar R_j = (n+1)/2 regardless of ties. Define the centred rank
vectors \tilde R_{\cdot j} = R_{\cdot j} - \bar R_j \mathbf{1},
where \mathbf{1}\in\mathbb{R}^n is the all-ones vector. The
Spearman correlation between columns i and j is the Pearson
correlation of their rank vectors:
\rho_S(i,j) \;=\;
\frac{\sum_{k=1}^n (R_{ki}-\bar R_i)(R_{kj}-\bar R_j)}
{\sqrt{\sum_{k=1}^n (R_{ki}-\bar R_i)^2}\;
\sqrt{\sum_{k=1}^n (R_{kj}-\bar R_j)^2}}.
In matrix form, with R=[R_{\cdot 1},\ldots,R_{\cdot p}],
\mu=(n+1)\mathbf{1}_p/2 for \mathbf{1}_p\in\mathbb{R}^p, and
S_R=\bigl(R-\mathbf{1}\mu^\top\bigr)^\top
\bigl(R-\mathbf{1}\mu^\top\bigr)/(n-1),
the Spearman correlation matrix is
\widehat{\rho}_S \;=\; D^{-1/2} S_R D^{-1/2}, \qquad
D \;=\; \mathrm{diag}(\mathrm{diag}(S_R)).
When there are no ties, the familiar rank-difference formula obtains
\rho_S(i,j) \;=\; 1 - \frac{6}{n(n^2-1)} \sum_{k=1}^n d_k^2,
\quad d_k \;=\; R_{ki}-R_{kj},
but this expression does not hold under ties; computing Pearson on
mid-ranks (as above) is the standard tie-robust approach. Without ties,
\mathrm{Var}(R_{\cdot j})=(n^2-1)/12; with ties, the variance is
smaller.
\rho_S(i,j) \in [-1,1] and \widehat{\rho}_S is symmetric
positive semi-definite by construction (up to floating-point error). The
implementation symmetrises the result to remove round-off asymmetry.
Spearman's correlation is invariant to strictly monotone transformations
applied separately to each variable.
Computation. Each column is ranked (mid-ranks) to form R.
The product R^\top R is computed via a 'BLAS' symmetric rank update
('SYRK'), and centred using
(R-\mathbf{1}\mu^\top)^\top (R-\mathbf{1}\mu^\top)
\;=\; R^\top R \;-\; n\,\mu\mu^\top,
avoiding an explicit centred copy. Division by n-1 yields the sample
covariance of ranks; standardising by D^{-1/2} gives \widehat{\rho}_S.
Columns with zero rank variance (all values equal) are returned as NA
along their row/column; the corresponding diagonal entry is also NA.
When na_method = "pairwise", each (i,j) estimate is recomputed
on the pairwise complete-case overlap of columns i and j. When
ci = TRUE, confidence intervals are computed in 'C++' using the
jackknife Euclidean-likelihood method of de Carvalho and Marques (2012).
For a pairwise estimate U = \hat\rho_S, delete-one jackknife
pseudo-values are formed as
Z_i = nU - (n-1)U_{(-i)}, \qquad i = 1,\ldots,n,
where U_{(-i)} is the Spearman correlation after removing observation
i. The confidence limits solve
\frac{n(U-\theta)^2}{n^{-1}\sum_{i=1}^n (Z_i - \theta)^2}
= \chi^2_{1,\;\texttt{conf\_level}}.
Ranking costs
O\!\bigl(p\,n\log n\bigr); forming and normalising
R^\top R costs O\!\bigl(n p^2\bigr) with O(p^2) additional
memory. The optional jackknife Euclidean-likelihood confidence intervals add
per-pair delete-one recomputation work and are intended for inference rather
than raw-matrix throughput.
A symmetric numeric matrix where the (i, j)-th element is
the Spearman correlation between the i-th and j-th
numeric columns of the input. When ci = TRUE, the object also
carries a ci attribute with elements est, lwr.ci,
upr.ci, and conf.level. When pairwise-complete evaluation is
used, pairwise sample sizes are stored in attr(x, "diagnostics")$n_complete.
Invisibly returns the spearman_rho object.
A ggplot object representing the heatmap.
Missing values are rejected when na_method = "error". Columns
with fewer than two usable observations are excluded.
Thiago de Paula Oliveira
Spearman, C. (1904). The proof and measurement of association between two things. International Journal of Epidemiology, 39(5), 1137-1150.
de Carvalho, M., & Marques, F. (2012). Jackknife Euclidean likelihood-based inference for Spearman's rho. North American Actuarial Journal, 16(4), 487-492.
print.spearman_rho, plot.spearman_rho
## Monotone transformation invariance (Spearman is rank-based)
set.seed(123)
n <- 400; p <- 6; rho <- 0.6
Sigma <- rho^abs(outer(seq_len(p), seq_len(p), "-"))
L <- chol(Sigma)
X <- matrix(rnorm(n * p), n, p) %*% L
colnames(X) <- paste0("V", seq_len(p))
X_mono <- X
X_mono[, 1] <- exp(X_mono[, 1])
X_mono[, 2] <- log1p(exp(X_mono[, 2]))
X_mono[, 3] <- X_mono[, 3]^3
sp_X <- spearman_rho(X)
sp_m <- spearman_rho(X_mono)
summary(sp_X)
round(max(abs(sp_X - sp_m)), 3)
plot(sp_X)
## Confidence intervals
sp_ci <- spearman_rho(X[, 1:3], ci = TRUE)
print(sp_ci, show_ci = "yes")
summary(sp_ci)
## Ties handled via mid-ranks
tied <- cbind(
a = rep(1:5, each = 20),
b = rep(5:1, each = 20) + rnorm(100, sd = 0.1),
c = as.numeric(gl(10, 10))
)
sp_tied <- spearman_rho(tied, ci = TRUE)
print(sp_tied, digits = 2, show_ci = "yes")
# Interactive viewing (requires shiny)
if (interactive() && requireNamespace("shiny", quietly = TRUE)) {
view_corr_shiny(sp_X)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.