OAS)
In matrixCorr: Collection of Correlation and Association Estimators

View source: R/partial_correlation.R

partial_correlation

R Documentation

Partial correlation matrix (sample / ridge / OAS)

Description

Computes the Gaussian partial correlation matrix from a numeric data frame or matrix. The covariance matrix can be estimated using:

Unbiased sample covariance: the standard empirical covariance estimator.
Ridge-regularised covariance: adds a positive ridge penalty to improve stability when the covariance matrix is near-singular.
OAS shrinkage to a scaled identity: recommended when p \gg n, as it reduces estimation error by shrinking towards a scaled identity matrix.

The method uses a high-performance 'C++' backend.

Prints only the partial correlation matrix (no attribute spam), with an optional one-line header stating the estimator used.

Produces a ggplot2-based heatmap of the partial correlation matrix stored in x$pcor. Optionally masks the diagonal and/or reorders variables via hierarchical clustering of 1 - |pcor|.

Usage

partial_correlation(
  data,
  method = c("oas", "ridge", "sample"),
  lambda = 0.001,
  return_cov_precision = FALSE
)

## S3 method for class 'partial_correlation'
print(x, digits = 3, show_method = TRUE, max_rows = NULL, max_cols = NULL, ...)

## S3 method for class 'partial_corr'
plot(
  x,
  title = NULL,
  low_color = "indianred1",
  high_color = "steelblue1",
  mid_color = "white",
  value_text_size = 4,
  mask_diag = TRUE,
  reorder = FALSE,
  ...
)

Arguments

`data`	A numeric matrix or data frame with at least two numeric columns. Non-numeric columns are ignored.
`method`	Character; one of `"oas"`, `"ridge"`, `"sample"`. Default `"oas"`.
`lambda`	Numeric `\ge 0`; ridge penalty added to the covariance diagonal when `method = "ridge"`. Ignored otherwise. Default `1e-3`.
`return_cov_precision`	Logical; if `TRUE`, also return the covariance (`cov`) and precision (`precision`) matrices used to form the partial correlations. Default to `FALSE`
`x`	An object of class `partial_corr`.
`digits`	Integer; number of decimal places for display (default 3).
`show_method`	Logical; print a one-line header with `method` (and `lambda`/`rho` if available). Default `TRUE`.
`max_rows`, `max_cols`	Optional integer limits for display; if provided, the printed matrix is truncated with a note about omitted rows/cols.
`...`	Additional arguments passed to `ggplot2::theme()` or other ggplot2 layers.
`title`	Plot title. By default, constructed from the estimator in `x$method`.
`low_color`	Colour for low (negative) values. Default `"indianred1"`.
`high_color`	Colour for high (positive) values. Default `"steelblue1"`.
`mid_color`	Colour for zero. Default `"white"`.
`value_text_size`	Font size for cell labels. Default `4`.
`mask_diag`	Logical; if `TRUE`, the diagonal is masked (set to `NA`) and not labelled. Default `TRUE`.
`reorder`	Logical; if `TRUE`, variables are reordered by hierarchical clustering of `1 - \|pcor\|`. Default `FALSE`.

Details

Statistical overview. Given an n \times p data matrix X (rows are observations, columns are variables), the routine estimates a partial correlation matrix via the precision (inverse covariance) matrix. Let \mu be the vector of column means and

S = (X - \mathbf{1}\mu)^\top (X - \mathbf{1}\mu)

be the centred cross-product matrix computed without forming a centred copy of X. Two conventional covariance scalings are formed:

\hat\Sigma_{\mathrm{MLE}} = S/n, \qquad \hat\Sigma_{\mathrm{unb}} = S/(n-1).

Sample: \Sigma = \hat\Sigma_{\mathrm{unb}}.
Ridge: \Sigma = \hat\Sigma_{\mathrm{unb}} + \lambda I_p with user-supplied \lambda \ge 0 (diagonal inflation).
OAS (Oracle Approximating Shrinkage): shrink \hat\Sigma_{\mathrm{MLE}} towards a scaled identity target \mu_I I_p, where \mu_I = \mathrm{tr}(\hat\Sigma_{\mathrm{MLE}})/p. The data-driven weight \rho \in [0,1] is

\rho = \min\!\left\{1,\max\!\left(0,\; \frac{(1-\tfrac{2}{p})\,\mathrm{tr}(\hat\Sigma_{\mathrm{MLE}}^2) + \mathrm{tr}(\hat\Sigma_{\mathrm{MLE}})^2} {(n + 1 - \tfrac{2}{p}) \left[\mathrm{tr}(\hat\Sigma_{\mathrm{MLE}}^2) - \tfrac{\mathrm{tr}(\hat\Sigma_{\mathrm{MLE}})^2}{p}\right]} \right)\right\},

and

\Sigma = (1-\rho)\,\hat\Sigma_{\mathrm{MLE}} + \rho\,\mu_I I_p.

The method then ensures positive definiteness of \Sigma (adding a very small diagonal jitter only if necessary) and computes the precision matrix \Theta = \Sigma^{-1}. Partial correlations are obtained by standardising the off-diagonals of \Theta:

\mathrm{pcor}_{ij} \;=\; -\,\frac{\theta_{ij}}{\sqrt{\theta_{ii}\,\theta_{jj}}}, \qquad \mathrm{pcor}_{ii}=1.

Interpretation. For Gaussian data, \mathrm{pcor}_{ij} equals the correlation between residuals from regressing variable i and variable j on all the remaining variables; equivalently, it encodes conditional dependence in a Gaussian graphical model, where \mathrm{pcor}_{ij}=0 if variables i and j are conditionally independent given the others. Partial correlations are invariant to separate rescalings of each variable; in particular, multiplying \Sigma by any positive scalar leaves the partial correlations unchanged.

Why shrinkage/regularisation? When p \ge n, the sample covariance is singular and inversion is ill-posed. Ridge and OAS both yield well-conditioned \Sigma. Ridge adds a fixed \lambda on the diagonal, whereas OAS shrinks adaptively towards \mu_I I_p with a weight chosen to minimise (approximately) the Frobenius risk under a Gaussian model, often improving mean–square accuracy in high dimension.

Computational notes. The implementation forms S using 'BLAS' syrk when available and constructs partial correlations by traversing only the upper triangle with 'OpenMP' parallelism. Positive definiteness is verified via a Cholesky factorisation; if it fails, a tiny diagonal jitter is increased geometrically up to a small cap, at which point the routine signals an error.

Value

An object of class "partial_corr" (a list) with elements:

pcor: p \times p partial correlation matrix.
cov (if requested): covariance matrix used.
precision (if requested): precision matrix \Theta.
method: the estimator used ("oas", "ridge", or "sample").
lambda: ridge penalty (or NA_real_).
rho: OAS shrinkage weight in [0,1] (or NA_real_).
jitter: diagonal jitter added (if any) to ensure positive definiteness.

Invisibly returns x.

A ggplot object.

References

Chen, Y., Wiesel, A., & Hero, A. O. III (2011). Robust Shrinkage Estimation of High-dimensional Covariance Matrices. IEEE Transactions on Signal Processing.

Ledoit, O., & Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88(2), 365–411.

Schafer, J., & Strimmer, K. (2005). A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical Applications in Genetics and Molecular Biology, 4(1), Article 32.

Examples

## Structured MVN with known partial correlations
set.seed(42)
p <- 12; n <- 1000

## Build a tri-diagonal precision (Omega) so the true partial correlations
## are sparse
phi <- 0.35
Omega <- diag(p)
for (j in 1:(p - 1)) {
  Omega[j, j + 1] <- Omega[j + 1, j] <- -phi
}
## Strict diagonal dominance
diag(Omega) <- 1 + 2 * abs(phi) + 0.05
Sigma <- solve(Omega)

## Upper Cholesky
L <- chol(Sigma)
Z <- matrix(rnorm(n * p), n, p)
X <- Z %*% L
colnames(X) <- sprintf("V%02d", seq_len(p))

pc <- partial_correlation(X, method = "oas")

## True partial correlation from Omega
pcor_true <- -Omega / sqrt(diag(Omega) %o% diag(Omega))
diag(pcor_true) <- 1

## Quick visual check (first 5x5 block)
round(pc$pcor[1:5, 1:5], 2)
round(pcor_true[1:5, 1:5], 2)

## Plot method
plot(pc)

## High-dimensional case p >> n
set.seed(7)
n <- 60; p <- 120

ar_block <- function(m, rho = 0.6) rho^abs(outer(seq_len(m), seq_len(m), "-"))

## Two AR(1) blocks on the diagonal
if (requireNamespace("Matrix", quietly = TRUE)) {
  Sigma_hd <- as.matrix(Matrix::bdiag(ar_block(60, 0.6), ar_block(60, 0.6)))
} else {
  Sigma_hd <- rbind(
    cbind(ar_block(60, 0.6), matrix(0, 60, 60)),
    cbind(matrix(0, 60, 60), ar_block(60, 0.6))
  )
}

L <- chol(Sigma_hd)
X_hd <- matrix(rnorm(n * p), n, p) %*% L
colnames(X_hd) <- paste0("G", seq_len(p))

pc_oas   <-
 partial_correlation(X_hd, method = "oas",   return_cov_precision = TRUE)
pc_ridge <-
 partial_correlation(X_hd, method = "ridge", lambda = 1e-2,
                     return_cov_precision = TRUE)
pc_samp  <-
 partial_correlation(X_hd, method = "sample", return_cov_precision = TRUE)

## Show how much diagonal regularisation was used
c(oas_jitter = pc_oas$jitter,
  ridge_lambda = pc_ridge$lambda,
  sample_jitter = pc_samp$jitter)

## Compare conditioning of the estimated covariance matrices
c(kappa_oas = kappa(pc_oas$cov),
  kappa_ridge = kappa(pc_ridge$cov),
  kappa_sample = kappa(pc_samp$cov))

## Simple conditional-dependence graph from partial correlations
pcor <- pc_oas$pcor
vals <- abs(pcor[upper.tri(pcor, diag = FALSE)])
thresh <- quantile(vals, 0.98)  # top 2%
edges  <- which(abs(pcor) > thresh & upper.tri(pcor), arr.ind = TRUE)
head(data.frame(i = colnames(pcor)[edges[,1]],
                j = colnames(pcor)[edges[,2]],
                pcor = round(pcor[edges], 2)))

matrixCorr documentation built on Aug. 26, 2025, 5:07 p.m.