| pearson_corr | R Documentation |
Computes pairwise Pearson correlations for the numeric columns of a matrix or data frame using a high-performance 'C++' backend. Optional Fisher-z confidence intervals are available.
pearson_corr(
data,
na_method = c("error", "pairwise"),
ci = FALSE,
conf_level = 0.95,
n_threads = getOption("matrixCorr.threads", 1L),
output = c("matrix", "sparse", "edge_list"),
threshold = 0,
diag = TRUE,
...
)
## S3 method for class 'pearson_corr'
print(
x,
digits = 4,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
ci_digits = 3,
show_ci = NULL,
...
)
## S3 method for class 'pearson_corr'
plot(
x,
title = "Pearson correlation heatmap",
low_color = "indianred1",
high_color = "steelblue1",
mid_color = "white",
value_text_size = 4,
ci_text_size = 3,
show_value = TRUE,
...
)
## S3 method for class 'pearson_corr'
summary(
object,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
ci_digits = 3,
show_ci = NULL,
...
)
## S3 method for class 'summary.pearson_corr'
print(
x,
digits = NULL,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
show_ci = NULL,
...
)
data |
A numeric matrix or a data frame with at least two numeric columns. All non-numeric columns will be excluded. Each column must have at least two non-missing values. |
na_method |
Character scalar controlling missing-data handling.
|
ci |
Logical (default |
conf_level |
Confidence level used when |
n_threads |
Integer |
output |
Output representation for the computed estimates.
|
threshold |
Non-negative absolute-value filter for non-matrix outputs:
keep entries with |
diag |
Logical; whether to include diagonal entries in
|
... |
Additional arguments passed to |
x |
An object of class |
digits |
Integer; number of decimal places to print in the concordance |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
ci_digits |
Integer; digits for Pearson confidence limits in the pairwise summary. |
show_ci |
One of |
title |
Plot title. Default is |
low_color |
Color for the minimum correlation. Default is
|
high_color |
Color for the maximum correlation. Default is
|
mid_color |
Color for zero correlation. Default is |
value_text_size |
Font size for displaying correlation values. Default
is |
ci_text_size |
Text size for confidence intervals in the heatmap. |
show_value |
Logical; if |
object |
An object of class |
Let X \in \mathbb{R}^{n \times p} be a
numeric matrix with rows as observations and columns as variables, and let
\mathbf{1} \in \mathbb{R}^n denote the all-ones vector. Define the column
means \mu = (1/n)\,\mathbf{1}^\top X and the centred cross-product
matrix
S \;=\; (X - \mathbf{1}\mu)^\top (X - \mathbf{1}\mu)
\;=\; X^\top \!\Big(I_n - \tfrac{1}{n}\mathbf{1}\mathbf{1}^\top\Big) X
\;=\; X^\top X \;-\; n\,\mu\,\mu^\top.
The (unbiased) sample covariance is
\widehat{\Sigma} \;=\; \tfrac{1}{n-1}\,S,
and the sample standard deviations are s_i = \sqrt{\widehat{\Sigma}_{ii}}.
The Pearson correlation matrix is obtained by standardising \widehat{\Sigma}, and it is given by
R \;=\; D^{-1/2}\,\widehat{\Sigma}\,D^{-1/2}, \qquad
D \;=\; \mathrm{diag}(\widehat{\Sigma}_{11},\ldots,\widehat{\Sigma}_{pp}),
equivalently, entrywise R_{ij} = \widehat{\Sigma}_{ij}/(s_i s_j) for
i \neq j and R_{ii} = 1. With 1/(n-1) scaling,
\widehat{\Sigma} is unbiased for the covariance; the induced
correlations are biased in finite samples.
The implementation forms X^\top X via a BLAS
symmetric rank-k update (SYRK) on the upper triangle, then applies the
rank-1 correction -\,n\,\mu\,\mu^\top to obtain S without
explicitly materialising X - \mathbf{1}\mu. After scaling by
1/(n-1), triangular normalisation by D^{-1/2} yields R,
which is then symmetrised to remove round-off asymmetry. Tiny negative values
on the covariance diagonal due to floating-point rounding are truncated to
zero before taking square roots.
If a variable has zero variance (s_i = 0), the corresponding row and
column of R are set to NA. When
na_method = "pairwise", each (i,j) correlation is recomputed on
the pairwise complete-case overlap of columns i and j.
When ci = TRUE, Fisher-z confidence intervals are computed from
the observed pairwise Pearson correlation r_{ij} and the pairwise
complete-case sample size n_{ij}:
z_{ij} = \operatorname{atanh}(r_{ij}), \qquad
\operatorname{SE}(z_{ij}) = \frac{1}{\sqrt{n_{ij} - 3}}.
With z_{1-\alpha/2} = \Phi^{-1}(1 - \alpha/2), the confidence limits are
\tanh\!\bigl(z_{ij} - z_{1-\alpha/2}\operatorname{SE}(z_{ij})\bigr)
\;\;\text{and}\;\;
\tanh\!\bigl(z_{ij} + z_{1-\alpha/2}\operatorname{SE}(z_{ij})\bigr).
Confidence intervals are reported only when n_{ij} > 3.
Computational complexity. The dominant cost is O(n p^2) flops
with O(p^2) memory.
A symmetric numeric matrix where the (i, j)-th element is
the Pearson correlation between the i-th and j-th
numeric columns of the input. When ci = TRUE, the object also
carries a ci attribute with elements est, lwr.ci,
upr.ci, and conf.level. When pairwise-complete evaluation is
used, pairwise sample sizes are stored in attr(x, "diagnostics")$n_complete.
Invisibly returns the pearson_corr object.
A ggplot object representing the heatmap.
Missing values are rejected when na_method = "error". Columns
with fewer than two usable observations are excluded.
Thiago de Paula Oliveira
Pearson, K. (1895). "Notes on regression and inheritance in the case of two parents". Proceedings of the Royal Society of London, 58, 240-242.
print.pearson_corr, plot.pearson_corr
## MVN with AR(1) correlation
set.seed(123)
p <- 6; n <- 300; rho <- 0.5
# true correlation
Sigma <- rho^abs(outer(seq_len(p), seq_len(p), "-"))
L <- chol(Sigma)
# MVN(n, 0, Sigma)
X <- matrix(rnorm(n * p), n, p) %*% L
colnames(X) <- paste0("V", seq_len(p))
pr <- pearson_corr(X)
print(pr, digits = 2)
summary(pr)
plot(pr)
## Compare the sample estimate to the truth
Rhat <- cor(X)
# estimated
round(Rhat[1:4, 1:4], 2)
# true
round(Sigma[1:4, 1:4], 2)
off <- upper.tri(Sigma, diag = FALSE)
# MAE on off-diagonals
mean(abs(Rhat[off] - Sigma[off]))
## Larger n reduces sampling error
n2 <- 2000
X2 <- matrix(rnorm(n2 * p), n2, p) %*% L
Rhat2 <- cor(X2)
off <- upper.tri(Sigma, diag = FALSE)
## mean absolute error (MAE) of the off-diagonal correlations
mean(abs(Rhat2[off] - Sigma[off]))
# Interactive viewing (requires shiny)
if (interactive() && requireNamespace("shiny", quietly = TRUE)) {
view_corr_shiny(pr)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.