| pbcor | R Documentation |
Computes all pairwise percentage bend correlations for the numeric columns of
a matrix or data frame. Percentage bend correlation limits the influence of
extreme marginal observations by bending standardised deviations into the
interval [-1, 1], yielding a Pearson-like measure that is robust to
outliers and heavy tails.
pbcor(
data,
na_method = c("error", "pairwise"),
ci = FALSE,
p_value = FALSE,
conf_level = 0.95,
n_threads = getOption("matrixCorr.threads", 1L),
beta = 0.2,
n_boot = 500L,
seed = NULL,
output = c("matrix", "sparse", "edge_list"),
threshold = 0,
diag = TRUE
)
## S3 method for class 'pbcor'
print(
x,
digits = 4,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
show_ci = NULL,
...
)
## S3 method for class 'pbcor'
plot(
x,
title = "Percentage bend correlation heatmap",
low_color = "indianred1",
high_color = "steelblue1",
mid_color = "white",
value_text_size = 4,
show_value = TRUE,
...
)
## S3 method for class 'pbcor'
summary(
object,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
ci_digits = 3,
p_digits = 4,
show_ci = NULL,
...
)
## S3 method for class 'summary.pbcor'
print(
x,
digits = NULL,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
show_ci = NULL,
...
)
data |
A numeric matrix or data frame containing numeric columns. |
na_method |
One of |
ci |
Logical (default |
p_value |
Logical (default |
conf_level |
Confidence level used when |
n_threads |
Integer |
beta |
Bending constant in |
n_boot |
Integer |
seed |
Optional positive integer used to seed the bootstrap resampling
when |
output |
Output representation for the computed estimates.
|
threshold |
Non-negative absolute-value filter for non-matrix outputs:
keep entries with |
diag |
Logical; whether to include diagonal entries in
|
x |
An object of class |
digits |
Integer; number of digits to print. |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
show_ci |
One of |
... |
Additional arguments passed to the underlying print or plot helper. |
title |
Character; plot title. |
low_color, high_color, mid_color |
Colors used in the heatmap. |
value_text_size |
Numeric text size for overlaid cell values. |
show_value |
Logical; if |
object |
An object of class |
ci_digits |
Integer; digits used for confidence limits in pairwise summaries. |
p_digits |
Integer; digits used for p-values in pairwise summaries. |
Let X \in \mathbb{R}^{n \times p} be a numeric matrix with rows as
observations and columns as variables. For a column
x = (x_i)_{i=1}^n, let m_x = \mathrm{med}(x) and define
\omega_\beta(x) as the \lfloor (1-\beta)n \rfloor-th order
statistic of |x_i - m_x|. Larger values of beta reduce
\omega_\beta(x), so more observations are bent to the bounds
-1 and 1.
The one-step percentage-bend location is
\hat\theta_{pb}(x) =
\frac{\sum_{i: |\psi_i| \le 1} x_i + \omega_\beta(x)(i_2 - i_1)}
{n - i_1 - i_2},
\qquad
\psi_i = \frac{x_i - m_x}{\omega_\beta(x)},
where i_1 = \sum_{i=1}^n \mathbf{1}(\psi_i < -1) and
i_2 = \sum_{i=1}^n \mathbf{1}(\psi_i > 1). The bent scores are
a_i = \max\!\left\{-1,\; \min\!\left(1,\frac{x_i - \hat\theta_{pb}(x)}
{\omega_\beta(x)}\right)\right\},
and likewise b_i for a second column y. The percentage bend
correlation for the pair (x,y) is
r_{pb}(x,y) =
\frac{\sum_{i=1}^n a_i b_i}
{\sqrt{\sum_{i=1}^n a_i^2}\sqrt{\sum_{i=1}^n b_i^2}}.
In the complete-data path, the bent score vectors are computed once per
column and collected into a matrix A = [a_{\cdot 1}, \ldots, a_{\cdot p}],
after which the correlation matrix is formed from their cross-products:
R_{pb} = D_A^{-1/2} A^\top A D_A^{-1/2},
\qquad
D_A = \mathrm{diag}(a_{\cdot 1}^\top a_{\cdot 1}, \ldots,
a_{\cdot p}^\top a_{\cdot p}).
If a column yields an undefined bent-score denominator, the corresponding row
and column are returned as NA. With na_method = "pairwise",
each pair is recomputed on its complete-case overlap. As with pairwise
Pearson correlation, this pairwise path can break positive semidefiniteness.
When p_value = TRUE, the method-specific test statistic for a pairwise
estimate r_{pb} based on n_{ij} complete observations is
T_{ij} = r_{pb,ij}\sqrt{\frac{n_{ij} - 2}{1 - r_{pb,ij}^2}},
and the reported p-value is the two-sided Student-t tail probability
with n_{ij}-2 degrees of freedom. When ci = TRUE, the interval
is a percentile bootstrap interval based on n_{\mathrm{boot}}
resamples drawn from the pairwise complete cases. If
\tilde r_{pb,(1)} \le \cdots \le \tilde r_{pb,(B)} denotes the sorted
bootstrap sample of finite estimates with B retained resamples, the
reported limits are
\tilde r_{pb,(\ell)} \quad \text{and} \quad \tilde r_{pb,(u)},
where \ell = \lfloor (\alpha/2) B + 0.5 \rfloor and
u = \lfloor (1-\alpha/2) B + 0.5 \rfloor for
\alpha = 1 - \mathrm{conf\_level}. Resamples that yield undefined
estimates are discarded before the percentile limits are formed.
Computational complexity. In the complete-data path, forming the
bent scores requires sorting within each column and the cross-product step
costs O(n p^2) with O(p^2) output storage. When
ci = TRUE, the bootstrap cost is incurred separately for each column
pair.
A symmetric correlation matrix with class pbcor and
attributes method = "percentage_bend_correlation",
description, and package = "matrixCorr". When
ci = TRUE, the returned object also carries a ci attribute
with elements est, lwr.ci, upr.ci,
conf.level, and ci.method, plus
attr(x, "conf.level"). When p_value = TRUE, it also carries
an inference attribute with elements estimate,
statistic, parameter, p_value, n_obs, and
alternative. When either inferential option is requested, the
object also carries diagnostics$n_complete.
Thiago de Paula Oliveira
Wilcox, R. R. (1994). The percentage bend correlation coefficient. Psychometrika, 59(4), 601-616. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/BF02294395")}
wincor(), skipped_corr(), bicor()
set.seed(10)
X <- matrix(rnorm(150 * 4), ncol = 4)
X[sample(length(X), 8)] <- X[sample(length(X), 8)] + 10
R <- pbcor(X)
print(R, digits = 2)
summary(R)
plot(R)
# Interactive viewing (requires shiny)
if (interactive() && requireNamespace("shiny", quietly = TRUE)) {
view_corr_shiny(R)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.