skipped_corr: Pairwise skipped correlation
In matrixCorr: Collection of Correlation and Association Estimators

skipped_corr

R Documentation

Pairwise skipped correlation

Description

Computes all pairwise skipped correlation coefficients for the numeric columns of a matrix or data frame using a high-performance 'C++' backend.

Skipped correlation detects bivariate outliers using a projection rule and then computes Pearson or Spearman correlation on the retained observations. It is designed for situations where marginally robust methods can still be distorted by unusual points in the joint data cloud.

Usage

skipped_corr(
  data,
  method = c("pearson", "spearman"),
  na_method = c("error", "pairwise"),
  ci = FALSE,
  p_value = FALSE,
  conf_level = 0.95,
  n_threads = getOption("matrixCorr.threads", 1L),
  return_masks = FALSE,
  stand = TRUE,
  outlier_rule = c("idealf", "mad"),
  cutoff = sqrt(stats::qchisq(0.975, df = 2)),
  n_boot = 2000L,
  p_adjust = c("none", "hochberg", "ecp"),
  fwe_level = 0.05,
  n_mc = 1000L,
  seed = NULL,
  output = c("matrix", "sparse", "edge_list"),
  threshold = 0,
  diag = TRUE
)

skipped_corr_masks(x, var1 = NULL, var2 = NULL)

## S3 method for class 'skipped_corr'
print(
  x,
  digits = 4,
  n = NULL,
  topn = NULL,
  max_vars = NULL,
  width = NULL,
  ci_digits = 4,
  show_ci = NULL,
  show_p = c("auto", "yes", "no"),
  ...
)

## S3 method for class 'skipped_corr'
plot(
  x,
  title = "Skipped correlation heatmap",
  low_color = "indianred1",
  high_color = "steelblue1",
  mid_color = "white",
  value_text_size = 4,
  ci_text_size = 3,
  show_value = TRUE,
  ...
)

## S3 method for class 'skipped_corr'
summary(
  object,
  n = NULL,
  topn = NULL,
  max_vars = NULL,
  width = NULL,
  show_ci = NULL,
  ...
)

## S3 method for class 'summary.skipped_corr'
print(
  x,
  digits = NULL,
  n = NULL,
  topn = NULL,
  max_vars = NULL,
  width = NULL,
  show_ci = NULL,
  ...
)

Arguments

`data`	A numeric matrix or a data frame with at least two numeric columns. All non-numeric columns will be excluded.
`method`	Correlation computed after removing projected outliers. One of `"pearson"` (default) or `"spearman"`.
`na_method`	One of `"error"` (default) or `"pairwise"`. With `"error"`, the function requires all retained numeric columns to be free of missing or non-finite values and aborts otherwise. This is the recommended setting when you want a single common sample size across all pairs, reproducible skipped-row diagnostics on the same rows, or bootstrap inference via `ci = TRUE` / `p_value = TRUE`. With `"pairwise"`, each variable pair is computed on its own overlap of finite rows. This is more permissive for incomplete data, but different pairs can be based on different effective samples and different skipped-row sets, so the resulting matrix is less directly comparable across entries.
`ci`	Logical; if `TRUE`, attach percentile-bootstrap confidence intervals for each skipped correlation using the Wilcox (2015) B2 resampling scheme. Default `FALSE`.
`p_value`	Logical; if `TRUE`, attach bootstrap p-values for testing whether each skipped correlation is zero. Default `FALSE`.
`conf_level`	Confidence level used when `ci = TRUE`. Default `0.95`.
`n_threads`	Integer `\geq 1`. Number of OpenMP threads. Defaults to `getOption("matrixCorr.threads", 1L)`.
`return_masks`	Logical; if `TRUE`, attach compact pairwise skipped-row indices as an attribute. Default `FALSE`.
`stand`	Logical; if `TRUE` (default), each variable in the pair is centred by its median and divided by a robust scale estimate before the projection outlier search. The scale estimate is the MAD when positive, with fallback to `\mathrm{IQR}/1.34898` and then the usual sample standard deviation if needed. This standardisation affects only outlier detection, not the final correlation computed on the retained observations.
`outlier_rule`	One of `"idealf"` (default) or `"mad"`. The default uses the ideal-fourths interquartile width of projected distances; `"mad"` uses the median absolute deviation of projected distances.
`cutoff`	Positive numeric constant multiplying the projected spread in the outlier rule `\mathrm{med}(d_{i\cdot}) + cutoff \times s(d_{i\cdot})`. Larger values flag fewer observations as outliers; smaller values flag more. Default `sqrt(qchisq(0.975, df = 2))`.
`n_boot`	Integer `\geq 2`. Number of bootstrap resamples used when `ci = TRUE` or `p_value = TRUE`. Default `2000`.
`p_adjust`	One of `"none"` (default), `"hochberg"`, or `"ecp"`. Optional familywise-error procedure applied to the matrix of bootstrap p-values. `"hochberg"` corresponds to method H in Wilcox, Rousselet, and Pernet (2018); `"ecp"` corresponds to their simulated critical-p-value method ECP.
`fwe_level`	Familywise-error level used when `p_adjust = "hochberg"` or `"ecp"`. Default `0.05`.
`n_mc`	Integer `\geq 10`. Number of null Monte Carlo data sets used when `p_adjust = "ecp"` to estimate the critical p-value. Default `1000`.
`seed`	Optional positive integer used to seed the bootstrap resampling when `ci = TRUE` or `p_value = TRUE`. If `NULL`, a fresh internal seed is generated.
`output`	Output representation for the computed estimates. `"matrix"` (default): full dense matrix; best when you need matrix algebra, dense heatmaps, or full compatibility with existing code. `"sparse"`: sparse matrix from Matrix containing only retained entries; best when many values are dropped by thresholding. `"edge_list"`: long-form data frame with columns `row`, `col`, `value`; convenient for filtering, joins, and network-style workflows.
`threshold`	Non-negative absolute-value filter for non-matrix outputs: keep entries with `abs(value) >= threshold`. Use `threshold > 0` when you want only stronger associations (typically with `output = "sparse"` or `"edge_list"`). Keep `threshold = 0` to retain all values. Must be `0` when `output = "matrix"`.
`diag`	Logical; whether to include diagonal entries in `"sparse"` and `"edge_list"` outputs.
`x`	An object of class `summary.skipped_corr`.
`var1`, `var2`	Optional column names or 1-based column indices used by `skipped_corr_masks()` to extract the skipped-row indices for one pair.
`digits`	Integer; number of digits to print.
`n`	Optional row threshold for compact preview output.
`topn`	Optional number of leading/trailing rows to show when truncated.
`max_vars`	Optional maximum number of visible columns; `NULL` derives this from console width.
`width`	Optional display width; defaults to `getOption("width")`.
`ci_digits`	Integer; digits for skipped-correlation confidence limits.
`show_ci`	One of `"yes"` or `"no"`.
`show_p`	One of `"auto"`, `"yes"`, `"no"`. For `print()`, `"auto"` keeps the compact matrix-only display; use `"yes"` to also print pairwise p-values.
`...`	Additional arguments passed to the underlying print or plot helper.
`title`	Character; plot title.
`low_color`, `high_color`, `mid_color`	Colors used in the heatmap.
`value_text_size`	Numeric text size for overlaid cell values.
`ci_text_size`	Text size for confidence intervals in the heatmap.
`show_value`	Logical; if `TRUE` (default), overlay numeric values on the heatmap tiles.
`object`	An object of class `skipped_corr`.

Details

Let X \in \mathbb{R}^{n \times p} be a numeric matrix with rows as observations and columns as variables. For a given pair of columns (x, y), write the observed bivariate points as u_i = (x_i, y_i)^\top, i=1,\ldots,n. If stand = TRUE, each margin is first centred by its median and divided by a robust scale estimate before outlier detection; otherwise the original pair is used. The robust scale is the MAD when positive, with fallback to \mathrm{IQR}/1.34898 and then the usual sample standard deviation if needed. Let \tilde u_i denote the resulting points and let c be the componentwise median center of the detection cloud.

For each observation i, define the direction vector b_i = \tilde u_i - c. When \|b_i\| > 0, all observations are projected onto the line through c in direction b_i. The projected distances are

d_{ij} \;=\; \frac{|(\tilde u_j - c)^\top b_i|}{\|b_i\|}, \qquad j=1,\ldots,n.

For each direction i, observation j is flagged as an outlier if

d_{ij} \;>\; \mathrm{med}(d_{i\cdot}) + g\, s(d_{i\cdot}), \qquad g = \code{cutoff},

where s(\cdot) is either the ideal-fourths interquartile width (outlier_rule = "idealf") or the median absolute deviation (outlier_rule = "mad"). An observation is removed if it is flagged for at least one projection direction. The skipped correlation is then the ordinary Pearson or Spearman correlation computed from the retained observations:

r_{\mathrm{skip}}(x,y) \;=\; \mathrm{cor}\!\left(x_{\mathcal{K}}, y_{\mathcal{K}}\right),

where \mathcal{K} is the index set of observations not flagged as outliers.

Unlike marginally robust methods such as pbcor(), wincor(), or bicor(), skipped correlation is explicitly pairwise because outlier detection depends on the joint geometry of each variable pair. As a result, the reported matrix need not be positive semidefinite, even with complete data.

Computational notes. In the complete-data path, each column pair requires a full bivariate projection search, so the dominant cost is higher than for marginal robust methods. The implementation evaluates pairs in 'C++'; where available, pairs are processed with 'OpenMP' parallelism. With na_method = "pairwise", each pair is recomputed on its overlap of non-missing rows.

Bootstrap inference. When ci = TRUE or p_value = TRUE, the implementation uses the percentile-bootstrap strategy studied by Wilcox (2015). Each bootstrap replicate resamples whole observation pairs with replacement, reruns the skipped-correlation outlier detection on the resampled data, and recomputes the skipped correlation on the retained observations. This corresponds to Wilcox's B2 method and avoids the statistically unsatisfactory shortcut of removing outliers only once before bootstrapping. Bootstrap inference currently requires complete data (na_method = "error"). When p_adjust = "hochberg", the bootstrap p-values are processed with Hochberg's step-up procedure (method H in Wilcox, Rousselet, and Pernet, 2018). When p_adjust = "ecp", the package follows their ECP method and simulates n_mc null data sets from a p-variate normal distribution with identity covariance, recomputes the pairwise bootstrap p-values for each null data set, stores the minimum p-value from each run, and estimates the fwe_level quantile of that null distribution using the Harrell-Davis estimator. Hypotheses are then rejected when their observed bootstrap p-values are less than or equal to the estimated critical p-value. The calibrated H1 procedure from Wilcox, Rousselet, and Pernet (2018) is not currently implemented.

Value

A symmetric correlation matrix with class skipped_corr and attributes method = "skipped_correlation", description, and package = "matrixCorr". When return_masks = TRUE, the matrix also carries a skipped_masks attribute containing compact pairwise skipped-row indices. The diagnostics attribute stores per-pair complete-case counts and skipped-row counts/proportions. When ci = TRUE or p_value = TRUE, bootstrap inference matrices are attached via attributes.

Author(s)

Thiago de Paula Oliveira

References

Wilcox, R. R. (2004). Inferences based on a skipped correlation coefficient. Journal of Applied Statistics, 31(2), 131-143. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/0266476032000148821")}

Wilcox, R. R. (2015). Inferences about the skipped correlation coefficient: Dealing with heteroscedasticity and non-normality. Journal of Modern Applied Statistical Methods, 14(1), 172-188. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.22237/jmasm/1430453580")}

Wilcox, R. R., Rousselet, G. A., & Pernet, C. R. (2018). Improved methods for making inferences about multiple skipped correlations. Journal of Statistical Computation and Simulation, 88(16), 3116-3131. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/00949655.2018.1501051")}

Examples

set.seed(12)
X <- matrix(rnorm(160 * 4), ncol = 4)
X[1, 1] <- 9
X[1, 2] <- -8

R <- skipped_corr(X, method = "pearson")
print(R, digits = 2)
summary(R)
plot(R)

Rm <- skipped_corr(X, method = "pearson", return_masks = TRUE)
skipped_corr_masks(Rm, 1, 2)

# Example 1:
Xm <- as.matrix(datasets::mtcars[, c("mpg", "disp", "hp", "wt")])
Rm2 <- skipped_corr(Xm, method = "spearman")
print(Rm2, digits = 2)

# Example 2:
Ri <- skipped_corr(Xm, method = "pearson", ci = TRUE, n_boot = 40, seed = 1)
Ri$ci

# Interactive viewing (requires shiny)
if (interactive() && requireNamespace("shiny", quietly = TRUE)) {
  view_corr_shiny(R)
}

matrixCorr documentation built on April 18, 2026, 5:06 p.m.