| wincor | R Documentation |
Computes all pairwise Winsorized correlation coefficients for the numeric columns of a matrix or data frame using a high-performance 'C++' backend.
This function Winsorizes each margin at proportion tr and then
computes ordinary Pearson correlation on the Winsorized values. It is a
simple robust alternative to Pearson correlation when the main concern is
unusually large or small observations in the marginal distributions.
wincor(
data,
na_method = c("error", "pairwise"),
ci = FALSE,
p_value = FALSE,
conf_level = 0.95,
n_threads = getOption("matrixCorr.threads", 1L),
tr = 0.2,
n_boot = 500L,
seed = NULL,
output = c("matrix", "sparse", "edge_list"),
threshold = 0,
diag = TRUE
)
## S3 method for class 'wincor'
print(
x,
digits = 4,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
show_ci = NULL,
...
)
## S3 method for class 'wincor'
plot(
x,
title = "Winsorized correlation heatmap",
low_color = "indianred1",
high_color = "steelblue1",
mid_color = "white",
value_text_size = 4,
show_value = TRUE,
...
)
## S3 method for class 'wincor'
summary(
object,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
ci_digits = 3,
p_digits = 4,
show_ci = NULL,
...
)
## S3 method for class 'summary.wincor'
print(
x,
digits = NULL,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
show_ci = NULL,
...
)
data |
A numeric matrix or a data frame with at least two numeric columns. All non-numeric columns will be excluded. |
na_method |
One of |
ci |
Logical (default |
p_value |
Logical (default |
conf_level |
Confidence level used when |
n_threads |
Integer |
tr |
Winsorization proportion in |
n_boot |
Integer |
seed |
Optional positive integer used to seed the bootstrap resampling
when |
output |
Output representation for the computed estimates.
|
threshold |
Non-negative absolute-value filter for non-matrix outputs:
keep entries with |
diag |
Logical; whether to include diagonal entries in
|
x |
An object of class |
digits |
Integer; number of digits to print. |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
show_ci |
One of |
... |
Additional arguments passed to the underlying print or plot helper. |
title |
Character; plot title. |
low_color, high_color, mid_color |
Colors used in the heatmap. |
value_text_size |
Numeric text size for overlaid cell values. |
show_value |
Logical; if |
object |
An object of class |
ci_digits |
Integer; digits used for confidence limits in pairwise summaries. |
p_digits |
Integer; digits used for p-values in pairwise summaries. |
Let X \in \mathbb{R}^{n \times p} be a numeric matrix with rows as
observations and columns as variables. For a column
x = (x_i)_{i=1}^n, write the order statistics as
x_{(1)} \le \cdots \le x_{(n)} and let
g = \lfloor tr \cdot n \rfloor. The Winsorized values can be written as
x_i^{(w)} \;=\; \max\!\bigl\{x_{(g+1)},\, \min(x_i, x_{(n-g)})\bigr\}.
For two columns x and y, the Winsorized correlation is the
ordinary Pearson correlation computed from x^{(w)} and y^{(w)}:
r_w(x,y) \;=\;
\frac{\sum_{i=1}^n (x_i^{(w)}-\bar x^{(w)})(y_i^{(w)}-\bar y^{(w)})}
{\sqrt{\sum_{i=1}^n (x_i^{(w)}-\bar x^{(w)})^2}\;
\sqrt{\sum_{i=1}^n (y_i^{(w)}-\bar y^{(w)})^2}}.
In matrix form, let X^{(w)} contain the Winsorized columns and define
the centred, unit-norm columns
z_{\cdot j} =
\frac{x_{\cdot j}^{(w)} - \bar x_j^{(w)} \mathbf{1}}
{\sqrt{\sum_{i=1}^n (x_{ij}^{(w)}-\bar x_j^{(w)})^2}},
\qquad j=1,\ldots,p.
If Z = [z_{\cdot 1}, \ldots, z_{\cdot p}], then the Winsorized
correlation matrix is
R_w \;=\; Z^\top Z.
Winsorization acts on each margin separately, so it guards against marginal
outliers and heavy tails but does not target unusual points in the joint
cloud. This implementation Winsorizes each column in 'C++', centres and
normalises it, and forms the complete-data matrix from cross-products. With
na_method = "pairwise", each pair is recomputed on its overlap of
non-missing rows. As with Pearson correlation, the complete-data path yields
a symmetric positive semidefinite matrix, whereas pairwise deletion can
break positive semidefiniteness. If the Winsorized variance of a column is
zero, correlations involving that column are returned as NA.
When p_value = TRUE, inference follows the method-specific test based
on
T_{ij} = r_{w,ij}\sqrt{\frac{n_{ij} - 2}{1 - r_{w,ij}^2}},
evaluated against a t-distribution with
n_{ij} - 2g_{ij} - 2 degrees of freedom, where
g_{ij} = \lfloor tr \cdot n_{ij} \rfloor and n_{ij} is the
pairwise complete-case sample size for the corresponding column pair. The
p-value is reported only when the pair is not identical and the resulting
degrees of freedom are positive. When ci = TRUE, the interval is a
percentile bootstrap interval based on n_{\mathrm{boot}} resamples
drawn from the pairwise complete cases. If
\tilde r_{w,(1)} \le \cdots \le \tilde r_{w,(B)} denotes the sorted
bootstrap sample of finite estimates with B retained resamples, the
reported limits are
\tilde r_{w,(\ell)} \quad \text{and} \quad \tilde r_{w,(u)},
where \ell = \lfloor (\alpha/2) B + 0.5 \rfloor and
u = \lfloor (1-\alpha/2) B + 0.5 \rfloor for
\alpha = 1 - \mathrm{conf\_level}. Resamples that yield undefined
estimates are discarded before the percentile limits are formed.
Computational complexity. In the complete-data path, Winsorizing the
columns requires sorting within each column, and forming the cross-product
matrix costs O(n p^2) with O(p^2) output storage. When
ci = TRUE, the bootstrap cost is incurred separately for each column
pair.
A symmetric correlation matrix with class wincor and
attributes method = "winsorized_correlation", description,
and package = "matrixCorr". When ci = TRUE, the returned
object also carries a ci attribute with elements est,
lwr.ci, upr.ci, conf.level, and ci.method,
plus attr(x, "conf.level"). When p_value = TRUE, it also
carries an inference attribute with elements estimate,
statistic, parameter, p_value, n_obs, and
alternative. When either inferential option is requested, the
object also carries diagnostics$n_complete.
Thiago de Paula Oliveira
Wilcox, R. R. (1993). Some results on a Winsorized correlation coefficient. British Journal of Mathematical and Statistical Psychology, 46(2), 339-349. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1111/j.2044-8317.1993.tb01020.x")}
Wilcox, R. R. (2012). Introduction to Robust Estimation and Hypothesis Testing (3rd ed.). Academic Press.
pbcor(), skipped_corr(), bicor()
set.seed(11)
X <- matrix(rnorm(180 * 4), ncol = 4)
X[sample(length(X), 6)] <- X[sample(length(X), 6)] - 12
R <- wincor(X, tr = 0.2)
print(R, digits = 2)
summary(R)
plot(R)
# Interactive viewing (requires shiny)
if (interactive() && requireNamespace("shiny", quietly = TRUE)) {
view_corr_shiny(R)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.