pcorr: Partial correlation matrix (sample / ridge / OAS / graphical...

View source: R/pcorr.R

pcorrR Documentation

Partial correlation matrix (sample / ridge / OAS / graphical lasso)

Description

Computes Gaussian partial correlations for the numeric columns of a matrix or data frame using a high-performance 'C++' backend. Covariance estimation is available via the classical sample estimator, ridge regularisation, OAS shrinkage, or graphical lasso. Optional p-values and Fisher-z confidence intervals are available for the classical sample estimator in the ordinary low-dimensional setting.

Usage

pcorr(
  data,
  method = c("sample", "oas", "ridge", "glasso"),
  ci = FALSE,
  conf_level = 0.95,
  return_cov_precision = FALSE,
  return_p_value = FALSE,
  lambda = 0.001,
  output = c("matrix", "sparse", "edge_list"),
  threshold = 0,
  diag = TRUE
)

## S3 method for class 'partial_corr'
print(
  x,
  digits = 3,
  show_method = TRUE,
  n = NULL,
  topn = NULL,
  max_vars = NULL,
  width = NULL,
  show_ci = NULL,
  ...
)

## S3 method for class 'partial_corr'
summary(
  object,
  n = NULL,
  topn = NULL,
  max_vars = NULL,
  width = NULL,
  show_ci = NULL,
  ...
)

## S3 method for class 'summary.partial_corr'
print(
  x,
  digits = 4,
  n = NULL,
  topn = NULL,
  max_vars = NULL,
  width = NULL,
  show_ci = NULL,
  ...
)

## S3 method for class 'partial_corr'
plot(
  x,
  title = NULL,
  low_color = "indianred1",
  high_color = "steelblue1",
  mid_color = "white",
  value_text_size = 4,
  show_value = TRUE,
  mask_diag = TRUE,
  reorder = FALSE,
  ...
)

Arguments

data

A numeric matrix or data frame with at least two numeric columns. Non-numeric columns are ignored.

method

Character; one of "sample", "oas", "ridge", or "glasso". Default "sample".

ci

Logical (default FALSE). If TRUE, attach Fisher-z confidence intervals for the off-diagonal partial correlations. This option is available only for the classical method = "sample" estimator in the ordinary low-dimensional setting.

conf_level

Confidence level used when ci = TRUE. Default is 0.95.

return_cov_precision

Logical; if TRUE, also return the covariance (cov) and precision (precision) matrices used to form the partial correlations. Default to FALSE

return_p_value

Logical; if TRUE, also return the matrix of two-sided p-values for testing whether each sample partial correlation is zero. This option is available only for method = "sample" and requires n > p. Default to FALSE.

lambda

Numeric \ge 0; regularisation strength. For method = "ridge", this is the penalty added to the covariance diagonal. For method = "glasso", this is the off-diagonal precision-matrix \ell_1 penalty. Ignored otherwise. Default 1e-3.

output

Output representation for the computed estimates.

  • "matrix" (default): full dense matrix; best when you need matrix algebra, dense heatmaps, or full compatibility with existing code.

  • "sparse": sparse matrix from Matrix containing only retained entries; best when many values are dropped by thresholding.

  • "edge_list": long-form data frame with columns row, col, value; convenient for filtering, joins, and network-style workflows.

threshold

Non-negative absolute-value filter for non-matrix outputs: keep entries with abs(value) >= threshold. Use threshold > 0 when you want only stronger associations (typically with output = "sparse" or "edge_list"). Keep threshold = 0 to retain all values. Must be 0 when output = "matrix".

diag

Logical; whether to include diagonal entries in "sparse" and "edge_list" outputs.

x

An object of class partial_corr.

digits

Integer; number of decimal places for display (default 3).

show_method

Logical; print a one-line header with method (and lambda/rho if available). Default TRUE.

n

Optional row threshold for compact preview output.

topn

Optional number of leading/trailing rows to show when truncated.

max_vars

Optional maximum number of visible columns; NULL derives this from console width.

width

Optional display width; defaults to getOption("width").

show_ci

One of "yes" or "no".

...

Additional arguments passed to ggplot2::theme() or other ggplot2 layers.

object

An object of class partial_corr.

title

Plot title. By default, constructed from the estimator in x$method.

low_color

Colour for low (negative) values. Default "indianred1".

high_color

Colour for high (positive) values. Default "steelblue1".

mid_color

Colour for zero. Default "white".

value_text_size

Font size for cell labels. Default 4.

show_value

Logical; if TRUE (default), overlay numeric values on the heatmap tiles.

mask_diag

Logical; if TRUE, the diagonal is masked (set to NA) and not labelled. Default TRUE.

reorder

Logical; if TRUE, variables are reordered by hierarchical clustering of 1 - |pcor|. Default FALSE.

Details

Statistical overview. Given an n \times p data matrix X (rows are observations, columns are variables), the routine estimates a partial correlation matrix via the precision (inverse covariance) matrix. Let \mu be the vector of column means and

S = (X - \mathbf{1}\mu)^\top (X - \mathbf{1}\mu)

be the centred cross-product matrix computed without forming a centred copy of X. Two conventional covariance scalings are formed:

\hat\Sigma_{\mathrm{MLE}} = S/n, \qquad \hat\Sigma_{\mathrm{unb}} = S/(n-1).

  • Sample: \Sigma = \hat\Sigma_{\mathrm{unb}}.

  • Ridge: \Sigma = \hat\Sigma_{\mathrm{unb}} + \lambda I_p with user-supplied \lambda \ge 0 (diagonal inflation).

  • OAS (Oracle Approximating Shrinkage): shrink \hat\Sigma_{\mathrm{MLE}} towards a scaled identity target \mu_I I_p, where \mu_I = \mathrm{tr}(\hat\Sigma_{\mathrm{MLE}})/p. The data-driven weight \rho \in [0,1] is

    \rho = \min\!\left\{1,\max\!\left(0,\; \frac{(1-\tfrac{2}{p})\,\mathrm{tr}(\hat\Sigma_{\mathrm{MLE}}^2) + \mathrm{tr}(\hat\Sigma_{\mathrm{MLE}})^2} {(n + 1 - \tfrac{2}{p}) \left[\mathrm{tr}(\hat\Sigma_{\mathrm{MLE}}^2) - \tfrac{\mathrm{tr}(\hat\Sigma_{\mathrm{MLE}})^2}{p}\right]} \right)\right\},

    and

    \Sigma = (1-\rho)\,\hat\Sigma_{\mathrm{MLE}} + \rho\,\mu_I I_p.

  • Graphical lasso: estimate a sparse precision matrix \Theta by maximising

    \log\det(\Theta) - \mathrm{tr}(\hat\Sigma_{\mathrm{MLE}}\Theta) - \lambda\sum_{i \ne j} |\theta_{ij}|,

    with \lambda \ge 0. The returned covariance matrix is \Sigma = \Theta^{-1}.

The method then ensures positive definiteness of \Sigma (adding a very small diagonal jitter only if necessary) and computes the precision matrix \Theta = \Sigma^{-1}. Partial correlations are obtained by standardising the off-diagonals of \Theta:

\mathrm{pcor}_{ij} \;=\; -\,\frac{\theta_{ij}}{\sqrt{\theta_{ii}\,\theta_{jj}}}, \qquad \mathrm{pcor}_{ii}=1.

If return_p_value = TRUE, the function also reports the classical two-sided test p-values for the sample partial correlations, using

t_{ij} = \mathrm{pcor}_{ij} \sqrt{\frac{n - p}{1 - \mathrm{pcor}_{ij}^2}}

with n - p degrees of freedom. These p-values are returned only for method = "sample", where they match the standard full-model partial correlation test.

When ci = TRUE, the function reports Fisher-z confidence intervals for the sample partial correlations. For a partial correlation r_{xy \cdot Z} conditioning on c variables, the transformed statistic is z = \operatorname{atanh}(r_{xy \cdot Z}) with standard error

\operatorname{SE}(z) = \frac{1}{\sqrt{n - 3 - c}},

where n is the effective complete-case sample size used for the estimate. The two-sided normal-theory interval is formed on the transformed scale using conf_level and then mapped back with tanh(). In the full matrix path implemented here, each off-diagonal entry conditions on all remaining variables, so c = p - 2 and the classical CI requires n > p + 1. This inference is only supported for method = "sample" without positive-definiteness repair; in unsupported or numerically singular settings, CI bounds are returned as NA with an informative cli warning or the request is rejected.

Interpretation. For Gaussian data, \mathrm{pcor}_{ij} equals the correlation between residuals from regressing variable i and variable j on all the remaining variables; equivalently, it encodes conditional dependence in a Gaussian graphical model, where \mathrm{pcor}_{ij}=0 if variables i and j are conditionally independent given the others. Partial correlations are invariant to separate rescalings of each variable; in particular, multiplying \Sigma by any positive scalar leaves the partial correlations unchanged.

Why shrinkage/regularisation? When p \ge n, the sample covariance is singular and inversion is ill-posed. Ridge and OAS both yield well-conditioned \Sigma. Ridge adds a fixed \lambda on the diagonal, whereas OAS shrinks adaptively towards \mu_I I_p with a weight chosen to minimise (approximately) the Frobenius risk under a Gaussian model, often improving mean-square accuracy in high dimension.

Why glasso? Glasso is useful when the goal is not just to stabilise a covariance estimate, but to recover a manageable network of direct relationships rather than a dense matrix of overall associations. In Gaussian models, zeros in the precision matrix correspond to conditional independences, so glasso can suppress indirect associations that are explained by the other variables and return a smaller, more interpretable conditional-dependence graph. This is especially practical in high-dimensional settings, where the sample covariance may be unstable or singular. Glasso yields a positive-definite precision estimate and supports edge selection, graph recovery, and downstream network analysis.

Computational notes. The implementation forms S using 'BLAS' syrk when available and constructs partial correlations by traversing only the upper triangle with 'OpenMP' parallelism. Positive definiteness is verified via a Cholesky factorisation; if it fails, a tiny diagonal jitter is increased geometrically up to a small cap, at which point the routine signals an error.

Value

An object of class "partial_corr" (a list) with elements:

  • pcor: p \times p partial correlation matrix.

  • cov (if requested): covariance matrix used.

  • precision (if requested): precision matrix \Theta.

  • p_value (if requested): matrix of two-sided p-values for the sample partial correlations.

  • ci (if requested): a list with elements est, lwr.ci, upr.ci, conf.level, and ci.method.

  • diagnostics: metadata used for inference, including the effective complete-case sample size and number of conditioned variables.

  • method: the estimator used ("oas", "ridge", "sample", or "glasso").

  • lambda: ridge or graphical-lasso penalty (or NA_real_).

  • rho: OAS shrinkage weight in [0,1] (or NA_real_).

  • jitter: diagonal jitter added (if any) to ensure positive definiteness.

Invisibly returns x.

A compact summary object of class summary.partial_corr.

A ggplot object.

References

Chen, Y., Wiesel, A., & Hero, A. O. III (2011). Robust Shrinkage Estimation of High-dimensional Covariance Matrices. IEEE Transactions on Signal Processing.

Friedman, J., Hastie, T., & Tibshirani, R. (2007). Sparse inverse covariance estimation with the graphical lasso. Biostatistics.

Ledoit, O., & Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88(2), 365-411.

Schafer, J., & Strimmer, K. (2005). A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical Applications in Genetics and Molecular Biology, 4(1), Article 32.

Examples

## Structured MVN with known partial correlations
set.seed(42)
p <- 12; n <- 1000

## Build a tri-diagonal precision (Omega) so the true partial correlations
## are sparse
phi <- 0.35
Omega <- diag(p)
for (j in 1:(p - 1)) {
  Omega[j, j + 1] <- Omega[j + 1, j] <- -phi
}
## Strict diagonal dominance
diag(Omega) <- 1 + 2 * abs(phi) + 0.05
Sigma <- solve(Omega)

## Upper Cholesky
L <- chol(Sigma)
Z <- matrix(rnorm(n * p), n, p)
X <- Z %*% L
colnames(X) <- sprintf("V%02d", seq_len(p))

pc <- pcorr(X)
summary(pc)

# Interactive viewing (requires shiny)
if (interactive() && requireNamespace("shiny", quietly = TRUE)) {
  view_corr_shiny(pc)
}

## True partial correlation from Omega
pcor_true <- -Omega / sqrt(diag(Omega) %o% diag(Omega))
diag(pcor_true) <- 1

## Quick visual check (first 5x5 block)
round(pc$pcor[1:5, 1:5], 2)
round(pcor_true[1:5, 1:5], 2)

## Plot method
plot(pc)

## Graphical-lasso example
set.seed(100)
p <- 20; n <- 250
Theta_g <- diag(p)
Theta_g[cbind(1:5, 2:6)] <- -0.25
Theta_g[cbind(2:6, 1:5)] <- -0.25
Theta_g[cbind(8:11, 9:12)] <- -0.20
Theta_g[cbind(9:12, 8:11)] <- -0.20
diag(Theta_g) <- rowSums(abs(Theta_g)) + 0.2

Sigma_g <- solve(Theta_g)
X_g <- matrix(rnorm(n * p), n, p) %*% chol(Sigma_g)
colnames(X_g) <- paste0("Node", seq_len(p))

gfit_1 <- pcorr(X_g, method = "glasso", lambda = 0.02,
                return_cov_precision = TRUE)
gfit_2 <- pcorr(X_g, method = "glasso", lambda = 0.08,
                return_cov_precision = TRUE)

## Larger lambda gives a sparser conditional-dependence graph
edge_count <- function(M, tol = 1e-8) {
  sum(abs(M[upper.tri(M, diag = FALSE)]) > tol)
}

c(edges_lambda_002 = edge_count(gfit_1$precision),
  edges_lambda_008 = edge_count(gfit_2$precision))

## Inspect strongest estimated conditional associations
pcor_g <- gfit_1$pcor
idx <- which(upper.tri(pcor_g), arr.ind = TRUE)
ord <- order(abs(pcor_g[idx]), decreasing = TRUE)
head(data.frame(
  i = rownames(pcor_g)[idx[ord, 1]],
  j = colnames(pcor_g)[idx[ord, 2]],
  pcor = round(pcor_g[idx][ord], 2)
))

## High-dimensional case p >> n
set.seed(7)
n <- 60; p <- 120

ar_block <- function(m, rho = 0.6) rho^abs(outer(seq_len(m), seq_len(m), "-"))

## Two AR(1) blocks on the diagonal
if (requireNamespace("Matrix", quietly = TRUE)) {
  Sigma_hd <- as.matrix(Matrix::bdiag(ar_block(60, 0.6), ar_block(60, 0.6)))
} else {
  Sigma_hd <- rbind(
    cbind(ar_block(60, 0.6), matrix(0, 60, 60)),
    cbind(matrix(0, 60, 60), ar_block(60, 0.6))
  )
}

L <- chol(Sigma_hd)
X_hd <- matrix(rnorm(n * p), n, p) %*% L
colnames(X_hd) <- paste0("G", seq_len(p))

pc_oas   <-
 pcorr(X_hd, method = "oas",   return_cov_precision = TRUE)
pc_ridge <-
 pcorr(X_hd, method = "ridge", lambda = 1e-2,
                     return_cov_precision = TRUE)
pc_samp  <-
 pcorr(X_hd, method = "sample", return_cov_precision = TRUE)
pc_glasso <-
 pcorr(X_hd, method = "glasso", lambda = 5e-3,
                     return_cov_precision = TRUE)

## Show how much diagonal regularisation was used
c(oas_jitter = pc_oas$jitter,
  ridge_lambda = pc_ridge$lambda,
  sample_jitter = pc_samp$jitter,
  glasso_lambda = pc_glasso$lambda)

## Compare conditioning of the estimated covariance matrices
c(kappa_oas = kappa(pc_oas$cov),
  kappa_ridge = kappa(pc_ridge$cov),
  kappa_sample = kappa(pc_samp$cov))

## Simple conditional-dependence graph from partial correlations
pcor <- pc_oas$pcor
vals <- abs(pcor[upper.tri(pcor, diag = FALSE)])
thresh <- quantile(vals, 0.98)  # top 2%
edges  <- which(abs(pcor) > thresh & upper.tri(pcor), arr.ind = TRUE)
head(data.frame(i = colnames(pcor)[edges[,1]],
                j = colnames(pcor)[edges[,2]],
                pcor = round(pcor[edges], 2)))


matrixCorr documentation built on April 18, 2026, 5:06 p.m.