wincor: Pairwise Winsorized correlation
In matrixCorr: Collection of Correlation and Association Estimators

wincor

R Documentation

Pairwise Winsorized correlation

Description

Computes all pairwise Winsorized correlation coefficients for the numeric columns of a matrix or data frame using a high-performance 'C++' backend.

This function Winsorizes each margin at proportion tr and then computes ordinary Pearson correlation on the Winsorized values. It is a simple robust alternative to Pearson correlation when the main concern is unusually large or small observations in the marginal distributions.

Usage

wincor(
  data,
  na_method = c("error", "pairwise"),
  ci = FALSE,
  p_value = FALSE,
  conf_level = 0.95,
  n_threads = getOption("matrixCorr.threads", 1L),
  tr = 0.2,
  n_boot = 500L,
  seed = NULL,
  output = c("matrix", "sparse", "edge_list"),
  threshold = 0,
  diag = TRUE
)

## S3 method for class 'wincor'
print(
  x,
  digits = 4,
  n = NULL,
  topn = NULL,
  max_vars = NULL,
  width = NULL,
  show_ci = NULL,
  ...
)

## S3 method for class 'wincor'
plot(
  x,
  title = "Winsorized correlation heatmap",
  low_color = "indianred1",
  high_color = "steelblue1",
  mid_color = "white",
  value_text_size = 4,
  show_value = TRUE,
  ...
)

## S3 method for class 'wincor'
summary(
  object,
  n = NULL,
  topn = NULL,
  max_vars = NULL,
  width = NULL,
  ci_digits = 3,
  p_digits = 4,
  show_ci = NULL,
  ...
)

## S3 method for class 'summary.wincor'
print(
  x,
  digits = NULL,
  n = NULL,
  topn = NULL,
  max_vars = NULL,
  width = NULL,
  show_ci = NULL,
  ...
)

Arguments

`data`	A numeric matrix or a data frame with at least two numeric columns. All non-numeric columns will be excluded.
`na_method`	One of `"error"` (default) or `"pairwise"`.
`ci`	Logical (default `FALSE`). If `TRUE`, attach percentile bootstrap confidence intervals for each pairwise estimate.
`p_value`	Logical (default `FALSE`). If `TRUE`, attach the method-specific large-sample test statistic and two-sided p-value for each pairwise estimate.
`conf_level`	Confidence level used when `ci = TRUE`. Default `0.95`.
`n_threads`	Integer `\geq 1`. Number of OpenMP threads. Defaults to `getOption("matrixCorr.threads", 1L)`.
`tr`	Winsorization proportion in `[0, 0.5)`. For a sample of size `n`, let `g = \lfloor tr \cdot n \rfloor`; the `g` smallest observations are set to the `(g+1)`-st order statistic and the `g` largest observations are set to the `(n-g)`-th order statistic. Default `0.2`.
`n_boot`	Integer `\geq 1`. Number of bootstrap resamples used when `ci = TRUE`. Default `500`.
`seed`	Optional positive integer used to seed the bootstrap resampling when `ci = TRUE`. If `NULL`, the current random-number stream is used.
`output`	Output representation for the computed estimates. `"matrix"` (default): full dense matrix; best when you need matrix algebra, dense heatmaps, or full compatibility with existing code. `"sparse"`: sparse matrix from Matrix containing only retained entries; best when many values are dropped by thresholding. `"edge_list"`: long-form data frame with columns `row`, `col`, `value`; convenient for filtering, joins, and network-style workflows.
`threshold`	Non-negative absolute-value filter for non-matrix outputs: keep entries with `abs(value) >= threshold`. Use `threshold > 0` when you want only stronger associations (typically with `output = "sparse"` or `"edge_list"`). Keep `threshold = 0` to retain all values. Must be `0` when `output = "matrix"`.
`diag`	Logical; whether to include diagonal entries in `"sparse"` and `"edge_list"` outputs.
`x`	An object of class `summary.wincor`.
`digits`	Integer; number of digits to print.
`n`	Optional row threshold for compact preview output.
`topn`	Optional number of leading/trailing rows to show when truncated.
`max_vars`	Optional maximum number of visible columns; `NULL` derives this from console width.
`width`	Optional display width; defaults to `getOption("width")`.
`show_ci`	One of `"yes"` or `"no"`.
`...`	Additional arguments passed to the underlying print or plot helper.
`title`	Character; plot title.
`low_color`, `high_color`, `mid_color`	Colors used in the heatmap.
`value_text_size`	Numeric text size for overlaid cell values.
`show_value`	Logical; if `TRUE` (default), overlay numeric values on the heatmap tiles.
`object`	An object of class `wincor`.
`ci_digits`	Integer; digits used for confidence limits in pairwise summaries.
`p_digits`	Integer; digits used for p-values in pairwise summaries.

Details

Let X \in \mathbb{R}^{n \times p} be a numeric matrix with rows as observations and columns as variables. For a column x = (x_i)_{i=1}^n, write the order statistics as x_{(1)} \le \cdots \le x_{(n)} and let g = \lfloor tr \cdot n \rfloor. The Winsorized values can be written as

x_i^{(w)} \;=\; \max\!\bigl\{x_{(g+1)},\, \min(x_i, x_{(n-g)})\bigr\}.

For two columns x and y, the Winsorized correlation is the ordinary Pearson correlation computed from x^{(w)} and y^{(w)}:

r_w(x,y) \;=\; \frac{\sum_{i=1}^n (x_i^{(w)}-\bar x^{(w)})(y_i^{(w)}-\bar y^{(w)})} {\sqrt{\sum_{i=1}^n (x_i^{(w)}-\bar x^{(w)})^2}\; \sqrt{\sum_{i=1}^n (y_i^{(w)}-\bar y^{(w)})^2}}.

In matrix form, let X^{(w)} contain the Winsorized columns and define the centred, unit-norm columns

z_{\cdot j} = \frac{x_{\cdot j}^{(w)} - \bar x_j^{(w)} \mathbf{1}} {\sqrt{\sum_{i=1}^n (x_{ij}^{(w)}-\bar x_j^{(w)})^2}}, \qquad j=1,\ldots,p.

If Z = [z_{\cdot 1}, \ldots, z_{\cdot p}], then the Winsorized correlation matrix is

R_w \;=\; Z^\top Z.

Winsorization acts on each margin separately, so it guards against marginal outliers and heavy tails but does not target unusual points in the joint cloud. This implementation Winsorizes each column in 'C++', centres and normalises it, and forms the complete-data matrix from cross-products. With na_method = "pairwise", each pair is recomputed on its overlap of non-missing rows. As with Pearson correlation, the complete-data path yields a symmetric positive semidefinite matrix, whereas pairwise deletion can break positive semidefiniteness. If the Winsorized variance of a column is zero, correlations involving that column are returned as NA.

When p_value = TRUE, inference follows the method-specific test based on

T_{ij} = r_{w,ij}\sqrt{\frac{n_{ij} - 2}{1 - r_{w,ij}^2}},

evaluated against a t-distribution with n_{ij} - 2g_{ij} - 2 degrees of freedom, where g_{ij} = \lfloor tr \cdot n_{ij} \rfloor and n_{ij} is the pairwise complete-case sample size for the corresponding column pair. The p-value is reported only when the pair is not identical and the resulting degrees of freedom are positive. When ci = TRUE, the interval is a percentile bootstrap interval based on n_{\mathrm{boot}} resamples drawn from the pairwise complete cases. If \tilde r_{w,(1)} \le \cdots \le \tilde r_{w,(B)} denotes the sorted bootstrap sample of finite estimates with B retained resamples, the reported limits are

\tilde r_{w,(\ell)} \quad \text{and} \quad \tilde r_{w,(u)},

where \ell = \lfloor (\alpha/2) B + 0.5 \rfloor and u = \lfloor (1-\alpha/2) B + 0.5 \rfloor for \alpha = 1 - \mathrm{conf\_level}. Resamples that yield undefined estimates are discarded before the percentile limits are formed.

Computational complexity. In the complete-data path, Winsorizing the columns requires sorting within each column, and forming the cross-product matrix costs O(n p^2) with O(p^2) output storage. When ci = TRUE, the bootstrap cost is incurred separately for each column pair.

Value

A symmetric correlation matrix with class wincor and attributes method = "winsorized_correlation", description, and package = "matrixCorr". When ci = TRUE, the returned object also carries a ci attribute with elements est, lwr.ci, upr.ci, conf.level, and ci.method, plus attr(x, "conf.level"). When p_value = TRUE, it also carries an inference attribute with elements estimate, statistic, parameter, p_value, n_obs, and alternative. When either inferential option is requested, the object also carries diagnostics$n_complete.

Author(s)

Thiago de Paula Oliveira

References

Wilcox, R. R. (1993). Some results on a Winsorized correlation coefficient. British Journal of Mathematical and Statistical Psychology, 46(2), 339-349. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1111/j.2044-8317.1993.tb01020.x")}

Wilcox, R. R. (2012). Introduction to Robust Estimation and Hypothesis Testing (3rd ed.). Academic Press.

Examples

set.seed(11)
X <- matrix(rnorm(180 * 4), ncol = 4)
X[sample(length(X), 6)] <- X[sample(length(X), 6)] - 12

R <- wincor(X, tr = 0.2)
print(R, digits = 2)
summary(R)
plot(R)

# Interactive viewing (requires shiny)
if (interactive() && requireNamespace("shiny", quietly = TRUE)) {
  view_corr_shiny(R)
}

matrixCorr documentation built on April 18, 2026, 5:06 p.m.