View source: R/partial_correlation.R
partial_correlation | R Documentation |
Computes the Gaussian partial correlation matrix from a numeric data frame or matrix. The covariance matrix can be estimated using:
Unbiased sample covariance: the standard empirical covariance estimator.
Ridge-regularised covariance: adds a positive ridge penalty to improve stability when the covariance matrix is near-singular.
OAS shrinkage to a scaled identity: recommended when
p \gg n
, as it reduces estimation error by shrinking towards a
scaled identity matrix.
The method uses a high-performance 'C++' backend.
Prints only the partial correlation matrix (no attribute spam), with an optional one-line header stating the estimator used.
Produces a ggplot2-based heatmap of the partial correlation matrix
stored in x$pcor
. Optionally masks the diagonal and/or reorders
variables via hierarchical clustering of 1 - |pcor|
.
partial_correlation(
data,
method = c("oas", "ridge", "sample"),
lambda = 0.001,
return_cov_precision = FALSE
)
## S3 method for class 'partial_correlation'
print(x, digits = 3, show_method = TRUE, max_rows = NULL, max_cols = NULL, ...)
## S3 method for class 'partial_corr'
plot(
x,
title = NULL,
low_color = "indianred1",
high_color = "steelblue1",
mid_color = "white",
value_text_size = 4,
mask_diag = TRUE,
reorder = FALSE,
...
)
data |
A numeric matrix or data frame with at least two numeric columns. Non-numeric columns are ignored. |
method |
Character; one of |
lambda |
Numeric |
return_cov_precision |
Logical; if |
x |
An object of class |
digits |
Integer; number of decimal places for display (default 3). |
show_method |
Logical; print a one-line header with |
max_rows , max_cols |
Optional integer limits for display; if provided, the printed matrix is truncated with a note about omitted rows/cols. |
... |
Additional arguments passed to |
title |
Plot title. By default, constructed from the estimator in
|
low_color |
Colour for low (negative) values. Default
|
high_color |
Colour for high (positive) values. Default
|
mid_color |
Colour for zero. Default |
value_text_size |
Font size for cell labels. Default |
mask_diag |
Logical; if |
reorder |
Logical; if |
Statistical overview. Given an n \times p
data matrix X
(rows are observations, columns are variables), the routine estimates a
partial correlation matrix via the precision (inverse covariance)
matrix. Let \mu
be the vector of column means and
S = (X - \mathbf{1}\mu)^\top (X - \mathbf{1}\mu)
be the centred cross-product matrix computed without forming a centred copy
of X
. Two conventional covariance scalings are formed:
\hat\Sigma_{\mathrm{MLE}} = S/n, \qquad
\hat\Sigma_{\mathrm{unb}} = S/(n-1).
Sample: \Sigma = \hat\Sigma_{\mathrm{unb}}
.
Ridge: \Sigma = \hat\Sigma_{\mathrm{unb}} + \lambda I_p
with user-supplied \lambda \ge 0
(diagonal inflation).
OAS (Oracle Approximating Shrinkage):
shrink \hat\Sigma_{\mathrm{MLE}}
towards a scaled identity
target \mu_I I_p
, where \mu_I = \mathrm{tr}(\hat\Sigma_{\mathrm{MLE}})/p
.
The data-driven weight \rho \in [0,1]
is
\rho = \min\!\left\{1,\max\!\left(0,\;
\frac{(1-\tfrac{2}{p})\,\mathrm{tr}(\hat\Sigma_{\mathrm{MLE}}^2)
+ \mathrm{tr}(\hat\Sigma_{\mathrm{MLE}})^2}
{(n + 1 - \tfrac{2}{p})
\left[\mathrm{tr}(\hat\Sigma_{\mathrm{MLE}}^2)
- \tfrac{\mathrm{tr}(\hat\Sigma_{\mathrm{MLE}})^2}{p}\right]}
\right)\right\},
and
\Sigma = (1-\rho)\,\hat\Sigma_{\mathrm{MLE}} + \rho\,\mu_I I_p.
The method then ensures positive definiteness of \Sigma
(adding a very
small diagonal jitter only if necessary) and computes the precision
matrix \Theta = \Sigma^{-1}
. Partial correlations are obtained by
standardising the off-diagonals of \Theta
:
\mathrm{pcor}_{ij} \;=\;
-\,\frac{\theta_{ij}}{\sqrt{\theta_{ii}\,\theta_{jj}}}, \qquad
\mathrm{pcor}_{ii}=1.
Interpretation. For Gaussian data, \mathrm{pcor}_{ij}
equals
the correlation between residuals from regressing variable i
and
variable j
on all the remaining variables; equivalently, it encodes
conditional dependence in a Gaussian graphical model, where
\mathrm{pcor}_{ij}=0
if variables i
and j
are
conditionally independent given the others. Partial correlations are
invariant to separate rescalings of each
variable; in particular, multiplying \Sigma
by any positive scalar
leaves the partial correlations unchanged.
Why shrinkage/regularisation? When p \ge n
, the sample
covariance is singular and inversion is ill-posed. Ridge and OAS both yield
well-conditioned \Sigma
. Ridge adds a fixed \lambda
on the
diagonal, whereas OAS shrinks adaptively towards \mu_I I_p
with a
weight chosen to minimise (approximately) the Frobenius risk under a
Gaussian model, often improving mean–square accuracy in high dimension.
Computational notes. The implementation forms S
using 'BLAS'
syrk
when available and constructs partial correlations by traversing
only the upper triangle with 'OpenMP' parallelism. Positive definiteness is
verified via a Cholesky factorisation; if it fails, a tiny diagonal jitter is
increased geometrically up to a small cap, at which point the routine
signals an error.
An object of class "partial_corr"
(a list) with elements:
pcor
: p \times p
partial correlation matrix.
cov
(if requested): covariance matrix used.
precision
(if requested): precision matrix \Theta
.
method
: the estimator used ("oas"
, "ridge"
,
or "sample"
).
lambda
: ridge penalty (or NA_real_
).
rho
: OAS shrinkage weight in [0,1]
(or NA_real_
).
jitter
: diagonal jitter added (if any) to ensure positive
definiteness.
Invisibly returns x
.
A ggplot
object.
Chen, Y., Wiesel, A., & Hero, A. O. III (2011). Robust Shrinkage Estimation of High-dimensional Covariance Matrices. IEEE Transactions on Signal Processing.
Ledoit, O., & Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88(2), 365–411.
Schafer, J., & Strimmer, K. (2005). A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical Applications in Genetics and Molecular Biology, 4(1), Article 32.
## Structured MVN with known partial correlations
set.seed(42)
p <- 12; n <- 1000
## Build a tri-diagonal precision (Omega) so the true partial correlations
## are sparse
phi <- 0.35
Omega <- diag(p)
for (j in 1:(p - 1)) {
Omega[j, j + 1] <- Omega[j + 1, j] <- -phi
}
## Strict diagonal dominance
diag(Omega) <- 1 + 2 * abs(phi) + 0.05
Sigma <- solve(Omega)
## Upper Cholesky
L <- chol(Sigma)
Z <- matrix(rnorm(n * p), n, p)
X <- Z %*% L
colnames(X) <- sprintf("V%02d", seq_len(p))
pc <- partial_correlation(X, method = "oas")
## True partial correlation from Omega
pcor_true <- -Omega / sqrt(diag(Omega) %o% diag(Omega))
diag(pcor_true) <- 1
## Quick visual check (first 5x5 block)
round(pc$pcor[1:5, 1:5], 2)
round(pcor_true[1:5, 1:5], 2)
## Plot method
plot(pc)
## High-dimensional case p >> n
set.seed(7)
n <- 60; p <- 120
ar_block <- function(m, rho = 0.6) rho^abs(outer(seq_len(m), seq_len(m), "-"))
## Two AR(1) blocks on the diagonal
if (requireNamespace("Matrix", quietly = TRUE)) {
Sigma_hd <- as.matrix(Matrix::bdiag(ar_block(60, 0.6), ar_block(60, 0.6)))
} else {
Sigma_hd <- rbind(
cbind(ar_block(60, 0.6), matrix(0, 60, 60)),
cbind(matrix(0, 60, 60), ar_block(60, 0.6))
)
}
L <- chol(Sigma_hd)
X_hd <- matrix(rnorm(n * p), n, p) %*% L
colnames(X_hd) <- paste0("G", seq_len(p))
pc_oas <-
partial_correlation(X_hd, method = "oas", return_cov_precision = TRUE)
pc_ridge <-
partial_correlation(X_hd, method = "ridge", lambda = 1e-2,
return_cov_precision = TRUE)
pc_samp <-
partial_correlation(X_hd, method = "sample", return_cov_precision = TRUE)
## Show how much diagonal regularisation was used
c(oas_jitter = pc_oas$jitter,
ridge_lambda = pc_ridge$lambda,
sample_jitter = pc_samp$jitter)
## Compare conditioning of the estimated covariance matrices
c(kappa_oas = kappa(pc_oas$cov),
kappa_ridge = kappa(pc_ridge$cov),
kappa_sample = kappa(pc_samp$cov))
## Simple conditional-dependence graph from partial correlations
pcor <- pc_oas$pcor
vals <- abs(pcor[upper.tri(pcor, diag = FALSE)])
thresh <- quantile(vals, 0.98) # top 2%
edges <- which(abs(pcor) > thresh & upper.tri(pcor), arr.ind = TRUE)
head(data.frame(i = colnames(pcor)[edges[,1]],
j = colnames(pcor)[edges[,2]],
pcor = round(pcor[edges], 2)))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.