knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE )
This vignette covers the estimators used when ordinary wide-data correlation is not enough. The package provides two broad extensions:
p >= n settings.The relevant functions are:
bicor()pbcor()wincor()skipped_corr()shrinkage_corr()pcorr()Outliers can distort ordinary correlation severely, especially in moderate
sample sizes. The robust estimators in matrixCorr are designed to limit that
effect, but they do so in different ways.
library(matrixCorr) set.seed(20) Y <- data.frame( x1 = rnorm(60), x2 = rnorm(60), x3 = rnorm(60), x4 = rnorm(60) ) idx <- sample.int(nrow(Y), 5) Y$x1[idx] <- Y$x1[idx] + 8 Y$x2[idx] <- Y$x2[idx] - 6 R_pear <- pearson_corr(Y) R_bicor <- bicor(Y) R_pb <- pbcor(Y) R_win <- wincor(Y) R_skip <- skipped_corr(Y) summary(R_pear) summary(R_bicor)
In practical terms:
bicor() is often a good first robust alternative for wide data.pbcor() and wincor() follow classical robustification routes and are easy
to compare against ordinary correlation.skipped_corr() is more aggressive because it first removes bivariate
outliers pair by pair.That last property is useful, but it also makes skipped_corr() the most
computationally expensive estimator in this group.
summary(R_skip)
Inference is not uniform across the robust estimators, so it is worth stating the current scope explicitly.
bicor() supports large-sample confidence intervals.pbcor() supports percentile-bootstrap confidence intervals and
method-specific p-values.wincor() supports percentile-bootstrap confidence intervals and
method-specific p-values.skipped_corr() supports bootstrap confidence intervals and bootstrap
p-values.fit_bicor_ci <- bicor(Y, ci = TRUE) summary(fit_bicor_ci) fit_pb_inf <- pbcor(Y, ci = TRUE, p_value = TRUE, n_boot = 200, seed = 1) summary(fit_pb_inf) fit_win_inf <- wincor(Y, ci = TRUE, p_value = TRUE, n_boot = 200, seed = 1) summary(fit_win_inf) fit_skip_inf <- skipped_corr(Y, ci = TRUE, p_value = TRUE, n_boot = 200, seed = 1) summary(fit_skip_inf)
For pbcor(), wincor(), and skipped_corr(), the inferential layer is more
computationally demanding than the estimate-only path because it is built from
bootstrap resampling. The skipped-correlation route remains the most expensive
of the three because each bootstrap sample also repeats the pairwise outlier
screening step. The vignette uses n_boot = 200 only to keep runtime modest;
substantive analyses usually need more resamples.
Partial correlation addresses a different target. Instead of summarising raw association, it estimates conditional association after accounting for the remaining variables.
set.seed(21) n <- 300 x2 <- rnorm(n) x1 <- 0.8 * x2 + rnorm(n, sd = 0.4) x3 <- 0.8 * x2 + rnorm(n, sd = 0.4) x4 <- 0.7 * x3 + rnorm(n, sd = 0.5) x5 <- rnorm(n) x6 <- rnorm(n) X <- data.frame(x1, x2, x3, x4, x5, x6) fit_pcor_sample <- pcorr(X, method = "sample") fit_pcor_oas <- pcorr(X, method = "oas") R_raw <- pearson_corr(X) round(c( raw_x1_x3 = R_raw["x1", "x3"], partial_x1_x3 = fit_pcor_sample$pcor["x1", "x3"] ), 2) print(fit_pcor_sample, digits = 2) summary(fit_pcor_oas)
Here x1 and x3 are strongly associated in the raw correlation matrix
because both depend on x2. Partial correlation is useful because it can
separate that indirect association from the remaining direct structure.
The choice of method is important:
"sample" is the ordinary full-rank route."oas" and "ridge" stabilise estimation in higher-dimensional settings."glasso" is appropriate when a sparse precision structure is the target.When the classical sample estimator is appropriate, partial correlation also supports p-values and Fisher-z confidence intervals.
Unlike the other correlation wrappers, pcorr() uses
return_p_value = TRUE rather than p_value = TRUE. That is intentional:
the p-value matrix is an optional returned component of the fitted list object.
fit_pcor_inf <- pcorr( X, method = "sample", return_p_value = TRUE, ci = TRUE ) summary(fit_pcor_inf)
That inference layer is available only for the ordinary sample estimator in the low-dimensional setting. The regularised estimators are primarily estimation tools rather than direct inferential procedures in the current interface.
When the number of variables is large relative to the sample size, direct
sample correlation can become unstable. shrinkage_corr() provides a shrinkage
correlation route designed for exactly this setting.
set.seed(22) n <- 40 p_block <- 30 make_block <- function(f) { sapply(seq_len(p_block), function(j) 0.8 * f + rnorm(n, sd = 0.8)) } f1 <- rnorm(n) f2 <- rnorm(n) f3 <- rnorm(n) Xd <- cbind(make_block(f1), make_block(f2), make_block(f3)) p <- ncol(Xd) colnames(Xd) <- paste0("G", seq_len(p)) R_raw <- stats::cor(Xd) fit_shr <- shrinkage_corr(Xd) block <- rep(1:3, each = p_block) within <- outer(block, block, "==") & upper.tri(R_raw) between <- outer(block, block, "!=") & upper.tri(R_raw) round(c( raw_within = mean(abs(R_raw[within])), raw_between = mean(abs(R_raw[between])), shrink_within = mean(abs(fit_shr[within])), shrink_between = mean(abs(fit_shr[between])) ), 3) print(fit_shr, digits = 2, max_rows = 6, max_vars = 6) summary(fit_shr)
The shrinkage estimator is not intended to reproduce the raw sample matrix. Its purpose is to provide a better-conditioned and more stable estimate in settings where the sample matrix is noisy or singular. In this block-factor simulation, the raw sample matrix retains the main within-block pattern but also carries substantial between-block noise; shrinkage pulls that background noise down without erasing the dominant structure.
The choice is usually governed by the main source of difficulty in the data.
bicor() and compare with
pbcor(), wincor(), or skipped_corr() as needed.pcorr().p >= n or unstable covariance estimation, use
shrinkage_corr() or a regularised pcorr() method.These are not interchangeable estimators. They solve different problems, even though they share the same matrix-oriented calling style.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.