graphical...
In matrixCorr: Collection of Correlation and Association Estimators

pcorr

R Documentation

Partial correlation matrix (sample / ridge / OAS / graphical lasso)

Description

Computes Gaussian partial correlations for the numeric columns of a matrix or data frame using a high-performance 'C++' backend. Covariance estimation is available via the classical sample estimator, ridge regularisation, OAS shrinkage, or graphical lasso. Optional p-values and Fisher-z confidence intervals are available for the classical sample estimator in the ordinary low-dimensional setting.

Usage

pcorr(
  data,
  method = c("sample", "oas", "ridge", "glasso"),
  ci = FALSE,
  conf_level = 0.95,
  return_cov_precision = FALSE,
  return_p_value = FALSE,
  lambda = 0.001,
  output = c("matrix", "sparse", "edge_list"),
  threshold = 0,
  diag = TRUE
)

## S3 method for class 'partial_corr'
print(
  x,
  digits = 3,
  show_method = TRUE,
  n = NULL,
  topn = NULL,
  max_vars = NULL,
  width = NULL,
  show_ci = NULL,
  ...
)

## S3 method for class 'partial_corr'
summary(
  object,
  n = NULL,
  topn = NULL,
  max_vars = NULL,
  width = NULL,
  show_ci = NULL,
  ...
)

## S3 method for class 'summary.partial_corr'
print(
  x,
  digits = 4,
  n = NULL,
  topn = NULL,
  max_vars = NULL,
  width = NULL,
  show_ci = NULL,
  ...
)

## S3 method for class 'partial_corr'
plot(
  x,
  title = NULL,
  low_color = "indianred1",
  high_color = "steelblue1",
  mid_color = "white",
  value_text_size = 4,
  show_value = TRUE,
  mask_diag = TRUE,
  reorder = FALSE,
  ...
)

Arguments

`data`	A numeric matrix or data frame with at least two numeric columns. Non-numeric columns are ignored.
`method`	Character; one of `"sample"`, `"oas"`, `"ridge"`, or `"glasso"`. Default `"sample"`.
`ci`	Logical (default `FALSE`). If `TRUE`, attach Fisher-z confidence intervals for the off-diagonal partial correlations. This option is available only for the classical `method = "sample"` estimator in the ordinary low-dimensional setting.
`conf_level`	Confidence level used when `ci = TRUE`. Default is `0.95`.
`return_cov_precision`	Logical; if `TRUE`, also return the covariance (`cov`) and precision (`precision`) matrices used to form the partial correlations. Default to `FALSE`
`return_p_value`	Logical; if `TRUE`, also return the matrix of two-sided p-values for testing whether each sample partial correlation is zero. This option is available only for `method = "sample"` and requires `n > p`. Default to `FALSE`.
`lambda`	Numeric `\ge 0`; regularisation strength. For `method = "ridge"`, this is the penalty added to the covariance diagonal. For `method = "glasso"`, this is the off-diagonal precision-matrix `\ell_1` penalty. Ignored otherwise. Default `1e-3`.
`output`	Output representation for the computed estimates. `"matrix"` (default): full dense matrix; best when you need matrix algebra, dense heatmaps, or full compatibility with existing code. `"sparse"`: sparse matrix from Matrix containing only retained entries; best when many values are dropped by thresholding. `"edge_list"`: long-form data frame with columns `row`, `col`, `value`; convenient for filtering, joins, and network-style workflows.
`threshold`	Non-negative absolute-value filter for non-matrix outputs: keep entries with `abs(value) >= threshold`. Use `threshold > 0` when you want only stronger associations (typically with `output = "sparse"` or `"edge_list"`). Keep `threshold = 0` to retain all values. Must be `0` when `output = "matrix"`.
`diag`	Logical; whether to include diagonal entries in `"sparse"` and `"edge_list"` outputs.
`x`	An object of class `partial_corr`.
`digits`	Integer; number of decimal places for display (default 3).
`show_method`	Logical; print a one-line header with `method` (and `lambda`/`rho` if available). Default `TRUE`.
`n`	Optional row threshold for compact preview output.
`topn`	Optional number of leading/trailing rows to show when truncated.
`max_vars`	Optional maximum number of visible columns; `NULL` derives this from console width.
`width`	Optional display width; defaults to `getOption("width")`.
`show_ci`	One of `"yes"` or `"no"`.
`...`	Additional arguments passed to `ggplot2::theme()` or other ggplot2 layers.
`object`	An object of class `partial_corr`.
`title`	Plot title. By default, constructed from the estimator in `x$method`.
`low_color`	Colour for low (negative) values. Default `"indianred1"`.
`high_color`	Colour for high (positive) values. Default `"steelblue1"`.
`mid_color`	Colour for zero. Default `"white"`.
`value_text_size`	Font size for cell labels. Default `4`.
`show_value`	Logical; if `TRUE` (default), overlay numeric values on the heatmap tiles.
`mask_diag`	Logical; if `TRUE`, the diagonal is masked (set to `NA`) and not labelled. Default `TRUE`.
`reorder`	Logical; if `TRUE`, variables are reordered by hierarchical clustering of `1 - \|pcor\|`. Default `FALSE`.

Details

Statistical overview. Given an n \times p data matrix X (rows are observations, columns are variables), the routine estimates a partial correlation matrix via the precision (inverse covariance) matrix. Let \mu be the vector of column means and

S = (X - \mathbf{1}\mu)^\top (X - \mathbf{1}\mu)

be the centred cross-product matrix computed without forming a centred copy of X. Two conventional covariance scalings are formed:

\hat\Sigma_{\mathrm{MLE}} = S/n, \qquad \hat\Sigma_{\mathrm{unb}} = S/(n-1).

Sample: \Sigma = \hat\Sigma_{\mathrm{unb}}.
Ridge: \Sigma = \hat\Sigma_{\mathrm{unb}} + \lambda I_p with user-supplied \lambda \ge 0 (diagonal inflation).
OAS (Oracle Approximating Shrinkage): shrink \hat\Sigma_{\mathrm{MLE}} towards a scaled identity target \mu_I I_p, where \mu_I = \mathrm{tr}(\hat\Sigma_{\mathrm{MLE}})/p. The data-driven weight \rho \in [0,1] is

\rho = \min\!\left\{1,\max\!\left(0,\; \frac{(1-\tfrac{2}{p})\,\mathrm{tr}(\hat\Sigma_{\mathrm{MLE}}^2) + \mathrm{tr}(\hat\Sigma_{\mathrm{MLE}})^2} {(n + 1 - \tfrac{2}{p}) \left[\mathrm{tr}(\hat\Sigma_{\mathrm{MLE}}^2) - \tfrac{\mathrm{tr}(\hat\Sigma_{\mathrm{MLE}})^2}{p}\right]} \right)\right\},

and

\Sigma = (1-\rho)\,\hat\Sigma_{\mathrm{MLE}} + \rho\,\mu_I I_p.
Graphical lasso: estimate a sparse precision matrix \Theta by maximising

\log\det(\Theta) - \mathrm{tr}(\hat\Sigma_{\mathrm{MLE}}\Theta) - \lambda\sum_{i \ne j} |\theta_{ij}|,

with \lambda \ge 0. The returned covariance matrix is \Sigma = \Theta^{-1}.

The method then ensures positive definiteness of \Sigma (adding a very small diagonal jitter only if necessary) and computes the precision matrix \Theta = \Sigma^{-1}. Partial correlations are obtained by standardising the off-diagonals of \Theta:

\mathrm{pcor}_{ij} \;=\; -\,\frac{\theta_{ij}}{\sqrt{\theta_{ii}\,\theta_{jj}}}, \qquad \mathrm{pcor}_{ii}=1.

If return_p_value = TRUE, the function also reports the classical two-sided test p-values for the sample partial correlations, using

t_{ij} = \mathrm{pcor}_{ij} \sqrt{\frac{n - p}{1 - \mathrm{pcor}_{ij}^2}}

with n - p degrees of freedom. These p-values are returned only for method = "sample", where they match the standard full-model partial correlation test.

When ci = TRUE, the function reports Fisher-z confidence intervals for the sample partial correlations. For a partial correlation r_{xy \cdot Z} conditioning on c variables, the transformed statistic is z = \operatorname{atanh}(r_{xy \cdot Z}) with standard error

\operatorname{SE}(z) = \frac{1}{\sqrt{n - 3 - c}},

where n is the effective complete-case sample size used for the estimate. The two-sided normal-theory interval is formed on the transformed scale using conf_level and then mapped back with tanh(). In the full matrix path implemented here, each off-diagonal entry conditions on all remaining variables, so c = p - 2 and the classical CI requires n > p + 1. This inference is only supported for method = "sample" without positive-definiteness repair; in unsupported or numerically singular settings, CI bounds are returned as NA with an informative cli warning or the request is rejected.

Interpretation. For Gaussian data, \mathrm{pcor}_{ij} equals the correlation between residuals from regressing variable i and variable j on all the remaining variables; equivalently, it encodes conditional dependence in a Gaussian graphical model, where \mathrm{pcor}_{ij}=0 if variables i and j are conditionally independent given the others. Partial correlations are invariant to separate rescalings of each variable; in particular, multiplying \Sigma by any positive scalar leaves the partial correlations unchanged.

Why shrinkage/regularisation? When p \ge n, the sample covariance is singular and inversion is ill-posed. Ridge and OAS both yield well-conditioned \Sigma. Ridge adds a fixed \lambda on the diagonal, whereas OAS shrinks adaptively towards \mu_I I_p with a weight chosen to minimise (approximately) the Frobenius risk under a Gaussian model, often improving mean-square accuracy in high dimension.

Why glasso? Glasso is useful when the goal is not just to stabilise a covariance estimate, but to recover a manageable network of direct relationships rather than a dense matrix of overall associations. In Gaussian models, zeros in the precision matrix correspond to conditional independences, so glasso can suppress indirect associations that are explained by the other variables and return a smaller, more interpretable conditional-dependence graph. This is especially practical in high-dimensional settings, where the sample covariance may be unstable or singular. Glasso yields a positive-definite precision estimate and supports edge selection, graph recovery, and downstream network analysis.

Computational notes. The implementation forms S using 'BLAS' syrk when available and constructs partial correlations by traversing only the upper triangle with 'OpenMP' parallelism. Positive definiteness is verified via a Cholesky factorisation; if it fails, a tiny diagonal jitter is increased geometrically up to a small cap, at which point the routine signals an error.

Value

An object of class "partial_corr" (a list) with elements:

pcor: p \times p partial correlation matrix.
cov (if requested): covariance matrix used.
precision (if requested): precision matrix \Theta.
p_value (if requested): matrix of two-sided p-values for the sample partial correlations.
ci (if requested): a list with elements est, lwr.ci, upr.ci, conf.level, and ci.method.
diagnostics: metadata used for inference, including the effective complete-case sample size and number of conditioned variables.
method: the estimator used ("oas", "ridge", "sample", or "glasso").
lambda: ridge or graphical-lasso penalty (or NA_real_).
rho: OAS shrinkage weight in [0,1] (or NA_real_).
jitter: diagonal jitter added (if any) to ensure positive definiteness.

Invisibly returns x.

A compact summary object of class summary.partial_corr.

A ggplot object.

References

Chen, Y., Wiesel, A., & Hero, A. O. III (2011). Robust Shrinkage Estimation of High-dimensional Covariance Matrices. IEEE Transactions on Signal Processing.

Friedman, J., Hastie, T., & Tibshirani, R. (2007). Sparse inverse covariance estimation with the graphical lasso. Biostatistics.

Ledoit, O., & Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88(2), 365-411.

Schafer, J., & Strimmer, K. (2005). A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical Applications in Genetics and Molecular Biology, 4(1), Article 32.

Examples

## Structured MVN with known partial correlations
set.seed(42)
p <- 12; n <- 1000

## Build a tri-diagonal precision (Omega) so the true partial correlations
## are sparse
phi <- 0.35
Omega <- diag(p)
for (j in 1:(p - 1)) {
  Omega[j, j + 1] <- Omega[j + 1, j] <- -phi
}
## Strict diagonal dominance
diag(Omega) <- 1 + 2 * abs(phi) + 0.05
Sigma <- solve(Omega)

## Upper Cholesky
L <- chol(Sigma)
Z <- matrix(rnorm(n * p), n, p)
X <- Z %*% L
colnames(X) <- sprintf("V%02d", seq_len(p))

pc <- pcorr(X)
summary(pc)

# Interactive viewing (requires shiny)
if (interactive() && requireNamespace("shiny", quietly = TRUE)) {
  view_corr_shiny(pc)
}

## True partial correlation from Omega
pcor_true <- -Omega / sqrt(diag(Omega) %o% diag(Omega))
diag(pcor_true) <- 1

## Quick visual check (first 5x5 block)
round(pc$pcor[1:5, 1:5], 2)
round(pcor_true[1:5, 1:5], 2)

## Plot method
plot(pc)

## Graphical-lasso example
set.seed(100)
p <- 20; n <- 250
Theta_g <- diag(p)
Theta_g[cbind(1:5, 2:6)] <- -0.25
Theta_g[cbind(2:6, 1:5)] <- -0.25
Theta_g[cbind(8:11, 9:12)] <- -0.20
Theta_g[cbind(9:12, 8:11)] <- -0.20
diag(Theta_g) <- rowSums(abs(Theta_g)) + 0.2

Sigma_g <- solve(Theta_g)
X_g <- matrix(rnorm(n * p), n, p) %*% chol(Sigma_g)
colnames(X_g) <- paste0("Node", seq_len(p))

gfit_1 <- pcorr(X_g, method = "glasso", lambda = 0.02,
                return_cov_precision = TRUE)
gfit_2 <- pcorr(X_g, method = "glasso", lambda = 0.08,
                return_cov_precision = TRUE)

## Larger lambda gives a sparser conditional-dependence graph
edge_count <- function(M, tol = 1e-8) {
  sum(abs(M[upper.tri(M, diag = FALSE)]) > tol)
}

c(edges_lambda_002 = edge_count(gfit_1$precision),
  edges_lambda_008 = edge_count(gfit_2$precision))

## Inspect strongest estimated conditional associations
pcor_g <- gfit_1$pcor
idx <- which(upper.tri(pcor_g), arr.ind = TRUE)
ord <- order(abs(pcor_g[idx]), decreasing = TRUE)
head(data.frame(
  i = rownames(pcor_g)[idx[ord, 1]],
  j = colnames(pcor_g)[idx[ord, 2]],
  pcor = round(pcor_g[idx][ord], 2)
))

## High-dimensional case p >> n
set.seed(7)
n <- 60; p <- 120

ar_block <- function(m, rho = 0.6) rho^abs(outer(seq_len(m), seq_len(m), "-"))

## Two AR(1) blocks on the diagonal
if (requireNamespace("Matrix", quietly = TRUE)) {
  Sigma_hd <- as.matrix(Matrix::bdiag(ar_block(60, 0.6), ar_block(60, 0.6)))
} else {
  Sigma_hd <- rbind(
    cbind(ar_block(60, 0.6), matrix(0, 60, 60)),
    cbind(matrix(0, 60, 60), ar_block(60, 0.6))
  )
}

L <- chol(Sigma_hd)
X_hd <- matrix(rnorm(n * p), n, p) %*% L
colnames(X_hd) <- paste0("G", seq_len(p))

pc_oas   <-
 pcorr(X_hd, method = "oas",   return_cov_precision = TRUE)
pc_ridge <-
 pcorr(X_hd, method = "ridge", lambda = 1e-2,
                     return_cov_precision = TRUE)
pc_samp  <-
 pcorr(X_hd, method = "sample", return_cov_precision = TRUE)
pc_glasso <-
 pcorr(X_hd, method = "glasso", lambda = 5e-3,
                     return_cov_precision = TRUE)

## Show how much diagonal regularisation was used
c(oas_jitter = pc_oas$jitter,
  ridge_lambda = pc_ridge$lambda,
  sample_jitter = pc_samp$jitter,
  glasso_lambda = pc_glasso$lambda)

## Compare conditioning of the estimated covariance matrices
c(kappa_oas = kappa(pc_oas$cov),
  kappa_ridge = kappa(pc_ridge$cov),
  kappa_sample = kappa(pc_samp$cov))

## Simple conditional-dependence graph from partial correlations
pcor <- pc_oas$pcor
vals <- abs(pcor[upper.tri(pcor, diag = FALSE)])
thresh <- quantile(vals, 0.98)  # top 2%
edges  <- which(abs(pcor) > thresh & upper.tri(pcor), arr.ind = TRUE)
head(data.frame(i = colnames(pcor)[edges[,1]],
                j = colnames(pcor)[edges[,2]],
                pcor = round(pcor[edges], 2)))

matrixCorr documentation built on April 18, 2026, 5:06 p.m.