3. Robust and High-Dimensional Correlation

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  warning = FALSE,
  message = FALSE
)

Scope

This vignette covers the estimators used when ordinary wide-data correlation is not enough. The package provides two broad extensions:

The relevant functions are:

Robust correlation

Outliers can distort ordinary correlation severely, especially in moderate sample sizes. The robust estimators in matrixCorr are designed to limit that effect, but they do so in different ways.

library(matrixCorr)

set.seed(20)
Y <- data.frame(
  x1 = rnorm(60),
  x2 = rnorm(60),
  x3 = rnorm(60),
  x4 = rnorm(60)
)

idx <- sample.int(nrow(Y), 5)
Y$x1[idx] <- Y$x1[idx] + 8
Y$x2[idx] <- Y$x2[idx] - 6

R_pear <- pearson_corr(Y)
R_bicor <- bicor(Y)
R_pb <- pbcor(Y)
R_win <- wincor(Y)
R_skip <- skipped_corr(Y)

summary(R_pear)
summary(R_bicor)

In practical terms:

That last property is useful, but it also makes skipped_corr() the most computationally expensive estimator in this group.

summary(R_skip)

Inference for robust estimators

Inference is not uniform across the robust estimators, so it is worth stating the current scope explicitly.

fit_bicor_ci <- bicor(Y, ci = TRUE)
summary(fit_bicor_ci)

fit_pb_inf <- pbcor(Y, ci = TRUE, p_value = TRUE, n_boot = 200, seed = 1)
summary(fit_pb_inf)

fit_win_inf <- wincor(Y, ci = TRUE, p_value = TRUE, n_boot = 200, seed = 1)
summary(fit_win_inf)

fit_skip_inf <- skipped_corr(Y, ci = TRUE, p_value = TRUE, n_boot = 200, seed = 1)
summary(fit_skip_inf)

For pbcor(), wincor(), and skipped_corr(), the inferential layer is more computationally demanding than the estimate-only path because it is built from bootstrap resampling. The skipped-correlation route remains the most expensive of the three because each bootstrap sample also repeats the pairwise outlier screening step. The vignette uses n_boot = 200 only to keep runtime modest; substantive analyses usually need more resamples.

Partial correlation

Partial correlation addresses a different target. Instead of summarising raw association, it estimates conditional association after accounting for the remaining variables.

set.seed(21)
n <- 300
x2 <- rnorm(n)
x1 <- 0.8 * x2 + rnorm(n, sd = 0.4)
x3 <- 0.8 * x2 + rnorm(n, sd = 0.4)
x4 <- 0.7 * x3 + rnorm(n, sd = 0.5)
x5 <- rnorm(n)
x6 <- rnorm(n)
X <- data.frame(x1, x2, x3, x4, x5, x6)

fit_pcor_sample <- pcorr(X, method = "sample")
fit_pcor_oas <- pcorr(X, method = "oas")
R_raw <- pearson_corr(X)

round(c(
  raw_x1_x3 = R_raw["x1", "x3"],
  partial_x1_x3 = fit_pcor_sample$pcor["x1", "x3"]
), 2)

print(fit_pcor_sample, digits = 2)
summary(fit_pcor_oas)

Here x1 and x3 are strongly associated in the raw correlation matrix because both depend on x2. Partial correlation is useful because it can separate that indirect association from the remaining direct structure.

The choice of method is important:

When the classical sample estimator is appropriate, partial correlation also supports p-values and Fisher-z confidence intervals.

Unlike the other correlation wrappers, pcorr() uses return_p_value = TRUE rather than p_value = TRUE. That is intentional: the p-value matrix is an optional returned component of the fitted list object.

fit_pcor_inf <- pcorr(
  X,
  method = "sample",
  return_p_value = TRUE,
  ci = TRUE
)

summary(fit_pcor_inf)

That inference layer is available only for the ordinary sample estimator in the low-dimensional setting. The regularised estimators are primarily estimation tools rather than direct inferential procedures in the current interface.

High-dimensional shrinkage correlation

When the number of variables is large relative to the sample size, direct sample correlation can become unstable. shrinkage_corr() provides a shrinkage correlation route designed for exactly this setting.

set.seed(22)
n <- 40
p_block <- 30

make_block <- function(f) {
  sapply(seq_len(p_block), function(j) 0.8 * f + rnorm(n, sd = 0.8))
}

f1 <- rnorm(n)
f2 <- rnorm(n)
f3 <- rnorm(n)

Xd <- cbind(make_block(f1), make_block(f2), make_block(f3))
p <- ncol(Xd)
colnames(Xd) <- paste0("G", seq_len(p))

R_raw <- stats::cor(Xd)
fit_shr <- shrinkage_corr(Xd)
block <- rep(1:3, each = p_block)
within <- outer(block, block, "==") & upper.tri(R_raw)
between <- outer(block, block, "!=") & upper.tri(R_raw)

round(c(
  raw_within = mean(abs(R_raw[within])),
  raw_between = mean(abs(R_raw[between])),
  shrink_within = mean(abs(fit_shr[within])),
  shrink_between = mean(abs(fit_shr[between]))
), 3)

print(fit_shr, digits = 2, max_rows = 6, max_vars = 6)
summary(fit_shr)

The shrinkage estimator is not intended to reproduce the raw sample matrix. Its purpose is to provide a better-conditioned and more stable estimate in settings where the sample matrix is noisy or singular. In this block-factor simulation, the raw sample matrix retains the main within-block pattern but also carries substantial between-block noise; shrinkage pulls that background noise down without erasing the dominant structure.

Choosing among these methods

The choice is usually governed by the main source of difficulty in the data.

These are not interchangeable estimators. They solve different problems, even though they share the same matrix-oriented calling style.



Try the matrixCorr package in your browser

Any scripts or data that you put into this service are public.

matrixCorr documentation built on April 18, 2026, 5:06 p.m.