| pcorr | R Documentation |
Computes Gaussian partial correlations for the numeric columns of a matrix or data frame using a high-performance 'C++' backend. Covariance estimation is available via the classical sample estimator, ridge regularisation, OAS shrinkage, or graphical lasso. Optional p-values and Fisher-z confidence intervals are available for the classical sample estimator in the ordinary low-dimensional setting.
pcorr(
data,
method = c("sample", "oas", "ridge", "glasso"),
ci = FALSE,
conf_level = 0.95,
return_cov_precision = FALSE,
return_p_value = FALSE,
lambda = 0.001,
output = c("matrix", "sparse", "edge_list"),
threshold = 0,
diag = TRUE
)
## S3 method for class 'partial_corr'
print(
x,
digits = 3,
show_method = TRUE,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
show_ci = NULL,
...
)
## S3 method for class 'partial_corr'
summary(
object,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
show_ci = NULL,
...
)
## S3 method for class 'summary.partial_corr'
print(
x,
digits = 4,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
show_ci = NULL,
...
)
## S3 method for class 'partial_corr'
plot(
x,
title = NULL,
low_color = "indianred1",
high_color = "steelblue1",
mid_color = "white",
value_text_size = 4,
show_value = TRUE,
mask_diag = TRUE,
reorder = FALSE,
...
)
data |
A numeric matrix or data frame with at least two numeric columns. Non-numeric columns are ignored. |
method |
Character; one of |
ci |
Logical (default |
conf_level |
Confidence level used when |
return_cov_precision |
Logical; if |
return_p_value |
Logical; if |
lambda |
Numeric |
output |
Output representation for the computed estimates.
|
threshold |
Non-negative absolute-value filter for non-matrix outputs:
keep entries with |
diag |
Logical; whether to include diagonal entries in
|
x |
An object of class |
digits |
Integer; number of decimal places for display (default 3). |
show_method |
Logical; print a one-line header with |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
show_ci |
One of |
... |
Additional arguments passed to |
object |
An object of class |
title |
Plot title. By default, constructed from the estimator in
|
low_color |
Colour for low (negative) values. Default
|
high_color |
Colour for high (positive) values. Default
|
mid_color |
Colour for zero. Default |
value_text_size |
Font size for cell labels. Default |
show_value |
Logical; if |
mask_diag |
Logical; if |
reorder |
Logical; if |
Statistical overview. Given an n \times p data matrix X
(rows are observations, columns are variables), the routine estimates a
partial correlation matrix via the precision (inverse covariance)
matrix. Let \mu be the vector of column means and
S = (X - \mathbf{1}\mu)^\top (X - \mathbf{1}\mu)
be the centred cross-product matrix computed without forming a centred copy
of X. Two conventional covariance scalings are formed:
\hat\Sigma_{\mathrm{MLE}} = S/n, \qquad
\hat\Sigma_{\mathrm{unb}} = S/(n-1).
Sample: \Sigma = \hat\Sigma_{\mathrm{unb}}.
Ridge: \Sigma = \hat\Sigma_{\mathrm{unb}} + \lambda I_p
with user-supplied \lambda \ge 0 (diagonal inflation).
OAS (Oracle Approximating Shrinkage):
shrink \hat\Sigma_{\mathrm{MLE}} towards a scaled identity
target \mu_I I_p, where \mu_I = \mathrm{tr}(\hat\Sigma_{\mathrm{MLE}})/p.
The data-driven weight \rho \in [0,1] is
\rho = \min\!\left\{1,\max\!\left(0,\;
\frac{(1-\tfrac{2}{p})\,\mathrm{tr}(\hat\Sigma_{\mathrm{MLE}}^2)
+ \mathrm{tr}(\hat\Sigma_{\mathrm{MLE}})^2}
{(n + 1 - \tfrac{2}{p})
\left[\mathrm{tr}(\hat\Sigma_{\mathrm{MLE}}^2)
- \tfrac{\mathrm{tr}(\hat\Sigma_{\mathrm{MLE}})^2}{p}\right]}
\right)\right\},
and
\Sigma = (1-\rho)\,\hat\Sigma_{\mathrm{MLE}} + \rho\,\mu_I I_p.
Graphical lasso: estimate a sparse precision matrix
\Theta by maximising
\log\det(\Theta) - \mathrm{tr}(\hat\Sigma_{\mathrm{MLE}}\Theta)
- \lambda\sum_{i \ne j} |\theta_{ij}|,
with \lambda \ge 0. The returned covariance matrix is
\Sigma = \Theta^{-1}.
The method then ensures positive definiteness of \Sigma (adding a very
small diagonal jitter only if necessary) and computes the precision
matrix \Theta = \Sigma^{-1}. Partial correlations are obtained by
standardising the off-diagonals of \Theta:
\mathrm{pcor}_{ij} \;=\;
-\,\frac{\theta_{ij}}{\sqrt{\theta_{ii}\,\theta_{jj}}}, \qquad
\mathrm{pcor}_{ii}=1.
If return_p_value = TRUE, the function also reports the classical
two-sided test p-values for the sample partial correlations, using
t_{ij} = \mathrm{pcor}_{ij}
\sqrt{\frac{n - p}{1 - \mathrm{pcor}_{ij}^2}}
with n - p degrees of freedom. These p-values are returned only for
method = "sample", where they match the standard full-model partial
correlation test.
When ci = TRUE, the function reports Fisher-z confidence
intervals for the sample partial correlations. For a partial correlation
r_{xy \cdot Z} conditioning on c variables, the transformed
statistic is z = \operatorname{atanh}(r_{xy \cdot Z}) with standard
error
\operatorname{SE}(z) = \frac{1}{\sqrt{n - 3 - c}},
where n is the effective complete-case sample size used for the
estimate. The two-sided normal-theory interval is formed on the transformed
scale using conf_level and then mapped back with tanh(). In
the full matrix path implemented here, each off-diagonal entry conditions on
all remaining variables, so c = p - 2 and the classical CI requires
n > p + 1. This inference is only supported for
method = "sample" without positive-definiteness repair; in
unsupported or numerically singular settings, CI bounds are returned as
NA with an informative cli warning or the request is rejected.
Interpretation. For Gaussian data, \mathrm{pcor}_{ij} equals
the correlation between residuals from regressing variable i and
variable j on all the remaining variables; equivalently, it encodes
conditional dependence in a Gaussian graphical model, where
\mathrm{pcor}_{ij}=0 if variables i and j are
conditionally independent given the others. Partial correlations are
invariant to separate rescalings of each
variable; in particular, multiplying \Sigma by any positive scalar
leaves the partial correlations unchanged.
Why shrinkage/regularisation? When p \ge n, the sample
covariance is singular and inversion is ill-posed. Ridge and OAS both yield
well-conditioned \Sigma. Ridge adds a fixed \lambda on the
diagonal, whereas OAS shrinks adaptively towards \mu_I I_p with a
weight chosen to minimise (approximately) the Frobenius risk under a
Gaussian model, often improving mean-square accuracy in high dimension.
Why glasso? Glasso is useful when the goal is not just to stabilise a covariance estimate, but to recover a manageable network of direct relationships rather than a dense matrix of overall associations. In Gaussian models, zeros in the precision matrix correspond to conditional independences, so glasso can suppress indirect associations that are explained by the other variables and return a smaller, more interpretable conditional-dependence graph. This is especially practical in high-dimensional settings, where the sample covariance may be unstable or singular. Glasso yields a positive-definite precision estimate and supports edge selection, graph recovery, and downstream network analysis.
Computational notes. The implementation forms S using 'BLAS'
syrk when available and constructs partial correlations by traversing
only the upper triangle with 'OpenMP' parallelism. Positive definiteness is
verified via a Cholesky factorisation; if it fails, a tiny diagonal jitter is
increased geometrically up to a small cap, at which point the routine
signals an error.
An object of class "partial_corr" (a list) with elements:
pcor: p \times p partial correlation matrix.
cov (if requested): covariance matrix used.
precision (if requested): precision matrix \Theta.
p_value (if requested): matrix of two-sided p-values for
the sample partial correlations.
ci (if requested): a list with elements est,
lwr.ci, upr.ci, conf.level, and ci.method.
diagnostics: metadata used for inference, including the
effective complete-case sample size and number of conditioned variables.
method: the estimator used ("oas", "ridge",
"sample", or "glasso").
lambda: ridge or graphical-lasso penalty
(or NA_real_).
rho: OAS shrinkage weight in [0,1] (or NA_real_).
jitter: diagonal jitter added (if any) to ensure positive
definiteness.
Invisibly returns x.
A compact summary object of class summary.partial_corr.
A ggplot object.
Chen, Y., Wiesel, A., & Hero, A. O. III (2011). Robust Shrinkage Estimation of High-dimensional Covariance Matrices. IEEE Transactions on Signal Processing.
Friedman, J., Hastie, T., & Tibshirani, R. (2007). Sparse inverse covariance estimation with the graphical lasso. Biostatistics.
Ledoit, O., & Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88(2), 365-411.
Schafer, J., & Strimmer, K. (2005). A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical Applications in Genetics and Molecular Biology, 4(1), Article 32.
## Structured MVN with known partial correlations
set.seed(42)
p <- 12; n <- 1000
## Build a tri-diagonal precision (Omega) so the true partial correlations
## are sparse
phi <- 0.35
Omega <- diag(p)
for (j in 1:(p - 1)) {
Omega[j, j + 1] <- Omega[j + 1, j] <- -phi
}
## Strict diagonal dominance
diag(Omega) <- 1 + 2 * abs(phi) + 0.05
Sigma <- solve(Omega)
## Upper Cholesky
L <- chol(Sigma)
Z <- matrix(rnorm(n * p), n, p)
X <- Z %*% L
colnames(X) <- sprintf("V%02d", seq_len(p))
pc <- pcorr(X)
summary(pc)
# Interactive viewing (requires shiny)
if (interactive() && requireNamespace("shiny", quietly = TRUE)) {
view_corr_shiny(pc)
}
## True partial correlation from Omega
pcor_true <- -Omega / sqrt(diag(Omega) %o% diag(Omega))
diag(pcor_true) <- 1
## Quick visual check (first 5x5 block)
round(pc$pcor[1:5, 1:5], 2)
round(pcor_true[1:5, 1:5], 2)
## Plot method
plot(pc)
## Graphical-lasso example
set.seed(100)
p <- 20; n <- 250
Theta_g <- diag(p)
Theta_g[cbind(1:5, 2:6)] <- -0.25
Theta_g[cbind(2:6, 1:5)] <- -0.25
Theta_g[cbind(8:11, 9:12)] <- -0.20
Theta_g[cbind(9:12, 8:11)] <- -0.20
diag(Theta_g) <- rowSums(abs(Theta_g)) + 0.2
Sigma_g <- solve(Theta_g)
X_g <- matrix(rnorm(n * p), n, p) %*% chol(Sigma_g)
colnames(X_g) <- paste0("Node", seq_len(p))
gfit_1 <- pcorr(X_g, method = "glasso", lambda = 0.02,
return_cov_precision = TRUE)
gfit_2 <- pcorr(X_g, method = "glasso", lambda = 0.08,
return_cov_precision = TRUE)
## Larger lambda gives a sparser conditional-dependence graph
edge_count <- function(M, tol = 1e-8) {
sum(abs(M[upper.tri(M, diag = FALSE)]) > tol)
}
c(edges_lambda_002 = edge_count(gfit_1$precision),
edges_lambda_008 = edge_count(gfit_2$precision))
## Inspect strongest estimated conditional associations
pcor_g <- gfit_1$pcor
idx <- which(upper.tri(pcor_g), arr.ind = TRUE)
ord <- order(abs(pcor_g[idx]), decreasing = TRUE)
head(data.frame(
i = rownames(pcor_g)[idx[ord, 1]],
j = colnames(pcor_g)[idx[ord, 2]],
pcor = round(pcor_g[idx][ord], 2)
))
## High-dimensional case p >> n
set.seed(7)
n <- 60; p <- 120
ar_block <- function(m, rho = 0.6) rho^abs(outer(seq_len(m), seq_len(m), "-"))
## Two AR(1) blocks on the diagonal
if (requireNamespace("Matrix", quietly = TRUE)) {
Sigma_hd <- as.matrix(Matrix::bdiag(ar_block(60, 0.6), ar_block(60, 0.6)))
} else {
Sigma_hd <- rbind(
cbind(ar_block(60, 0.6), matrix(0, 60, 60)),
cbind(matrix(0, 60, 60), ar_block(60, 0.6))
)
}
L <- chol(Sigma_hd)
X_hd <- matrix(rnorm(n * p), n, p) %*% L
colnames(X_hd) <- paste0("G", seq_len(p))
pc_oas <-
pcorr(X_hd, method = "oas", return_cov_precision = TRUE)
pc_ridge <-
pcorr(X_hd, method = "ridge", lambda = 1e-2,
return_cov_precision = TRUE)
pc_samp <-
pcorr(X_hd, method = "sample", return_cov_precision = TRUE)
pc_glasso <-
pcorr(X_hd, method = "glasso", lambda = 5e-3,
return_cov_precision = TRUE)
## Show how much diagonal regularisation was used
c(oas_jitter = pc_oas$jitter,
ridge_lambda = pc_ridge$lambda,
sample_jitter = pc_samp$jitter,
glasso_lambda = pc_glasso$lambda)
## Compare conditioning of the estimated covariance matrices
c(kappa_oas = kappa(pc_oas$cov),
kappa_ridge = kappa(pc_ridge$cov),
kappa_sample = kappa(pc_samp$cov))
## Simple conditional-dependence graph from partial correlations
pcor <- pc_oas$pcor
vals <- abs(pcor[upper.tri(pcor, diag = FALSE)])
thresh <- quantile(vals, 0.98) # top 2%
edges <- which(abs(pcor) > thresh & upper.tri(pcor), arr.ind = TRUE)
head(data.frame(i = colnames(pcor)[edges[,1]],
j = colnames(pcor)[edges[,2]],
pcor = round(pcor[edges], 2)))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.