pbcor: Percentage bend correlation

View source: R/pbcor.R

pbcorR Documentation

Percentage bend correlation

Description

Computes all pairwise percentage bend correlations for the numeric columns of a matrix or data frame. Percentage bend correlation limits the influence of extreme marginal observations by bending standardised deviations into the interval [-1, 1], yielding a Pearson-like measure that is robust to outliers and heavy tails.

Usage

pbcor(
  data,
  na_method = c("error", "pairwise"),
  ci = FALSE,
  p_value = FALSE,
  conf_level = 0.95,
  n_threads = getOption("matrixCorr.threads", 1L),
  beta = 0.2,
  n_boot = 500L,
  seed = NULL,
  output = c("matrix", "sparse", "edge_list"),
  threshold = 0,
  diag = TRUE
)

## S3 method for class 'pbcor'
print(
  x,
  digits = 4,
  n = NULL,
  topn = NULL,
  max_vars = NULL,
  width = NULL,
  show_ci = NULL,
  ...
)

## S3 method for class 'pbcor'
plot(
  x,
  title = "Percentage bend correlation heatmap",
  low_color = "indianred1",
  high_color = "steelblue1",
  mid_color = "white",
  value_text_size = 4,
  show_value = TRUE,
  ...
)

## S3 method for class 'pbcor'
summary(
  object,
  n = NULL,
  topn = NULL,
  max_vars = NULL,
  width = NULL,
  ci_digits = 3,
  p_digits = 4,
  show_ci = NULL,
  ...
)

## S3 method for class 'summary.pbcor'
print(
  x,
  digits = NULL,
  n = NULL,
  topn = NULL,
  max_vars = NULL,
  width = NULL,
  show_ci = NULL,
  ...
)

Arguments

data

A numeric matrix or data frame containing numeric columns.

na_method

One of "error" (default) or "pairwise". With "pairwise", each correlation is computed on the overlapping complete rows for the column pair.

ci

Logical (default FALSE). If TRUE, attach percentile bootstrap confidence intervals for each pairwise estimate.

p_value

Logical (default FALSE). If TRUE, attach the method-specific large-sample test statistic and two-sided p-value for each pairwise estimate.

conf_level

Confidence level used when ci = TRUE. Default 0.95.

n_threads

Integer \geq 1. Number of OpenMP threads used for the point-estimate matrix computation. Defaults to getOption("matrixCorr.threads", 1L).

beta

Bending constant in [0, 0.5) that sets the cutoff used to bend standardised deviations toward the interval [-1, 1]. Larger values cause more observations to be bent and increase resistance to marginal outliers. Default 0.2. See Details.

n_boot

Integer \geq 1. Number of bootstrap resamples used when ci = TRUE. Default 500.

seed

Optional positive integer used to seed the bootstrap resampling when ci = TRUE. If NULL, the current random-number stream is used.

output

Output representation for the computed estimates.

  • "matrix" (default): full dense matrix; best when you need matrix algebra, dense heatmaps, or full compatibility with existing code.

  • "sparse": sparse matrix from Matrix containing only retained entries; best when many values are dropped by thresholding.

  • "edge_list": long-form data frame with columns row, col, value; convenient for filtering, joins, and network-style workflows.

threshold

Non-negative absolute-value filter for non-matrix outputs: keep entries with abs(value) >= threshold. Use threshold > 0 when you want only stronger associations (typically with output = "sparse" or "edge_list"). Keep threshold = 0 to retain all values. Must be 0 when output = "matrix".

diag

Logical; whether to include diagonal entries in "sparse" and "edge_list" outputs.

x

An object of class summary.pbcor.

digits

Integer; number of digits to print.

n

Optional row threshold for compact preview output.

topn

Optional number of leading/trailing rows to show when truncated.

max_vars

Optional maximum number of visible columns; NULL derives this from console width.

width

Optional display width; defaults to getOption("width").

show_ci

One of "yes" or "no".

...

Additional arguments passed to the underlying print or plot helper.

title

Character; plot title.

low_color, high_color, mid_color

Colors used in the heatmap.

value_text_size

Numeric text size for overlaid cell values.

show_value

Logical; if TRUE (default), overlay numeric values on the heatmap tiles.

object

An object of class pbcor.

ci_digits

Integer; digits used for confidence limits in pairwise summaries.

p_digits

Integer; digits used for p-values in pairwise summaries.

Details

Let X \in \mathbb{R}^{n \times p} be a numeric matrix with rows as observations and columns as variables. For a column x = (x_i)_{i=1}^n, let m_x = \mathrm{med}(x) and define \omega_\beta(x) as the \lfloor (1-\beta)n \rfloor-th order statistic of |x_i - m_x|. Larger values of beta reduce \omega_\beta(x), so more observations are bent to the bounds -1 and 1.

The one-step percentage-bend location is

\hat\theta_{pb}(x) = \frac{\sum_{i: |\psi_i| \le 1} x_i + \omega_\beta(x)(i_2 - i_1)} {n - i_1 - i_2}, \qquad \psi_i = \frac{x_i - m_x}{\omega_\beta(x)},

where i_1 = \sum_{i=1}^n \mathbf{1}(\psi_i < -1) and i_2 = \sum_{i=1}^n \mathbf{1}(\psi_i > 1). The bent scores are

a_i = \max\!\left\{-1,\; \min\!\left(1,\frac{x_i - \hat\theta_{pb}(x)} {\omega_\beta(x)}\right)\right\},

and likewise b_i for a second column y. The percentage bend correlation for the pair (x,y) is

r_{pb}(x,y) = \frac{\sum_{i=1}^n a_i b_i} {\sqrt{\sum_{i=1}^n a_i^2}\sqrt{\sum_{i=1}^n b_i^2}}.

In the complete-data path, the bent score vectors are computed once per column and collected into a matrix A = [a_{\cdot 1}, \ldots, a_{\cdot p}], after which the correlation matrix is formed from their cross-products:

R_{pb} = D_A^{-1/2} A^\top A D_A^{-1/2}, \qquad D_A = \mathrm{diag}(a_{\cdot 1}^\top a_{\cdot 1}, \ldots, a_{\cdot p}^\top a_{\cdot p}).

If a column yields an undefined bent-score denominator, the corresponding row and column are returned as NA. With na_method = "pairwise", each pair is recomputed on its complete-case overlap. As with pairwise Pearson correlation, this pairwise path can break positive semidefiniteness.

When p_value = TRUE, the method-specific test statistic for a pairwise estimate r_{pb} based on n_{ij} complete observations is

T_{ij} = r_{pb,ij}\sqrt{\frac{n_{ij} - 2}{1 - r_{pb,ij}^2}},

and the reported p-value is the two-sided Student-t tail probability with n_{ij}-2 degrees of freedom. When ci = TRUE, the interval is a percentile bootstrap interval based on n_{\mathrm{boot}} resamples drawn from the pairwise complete cases. If \tilde r_{pb,(1)} \le \cdots \le \tilde r_{pb,(B)} denotes the sorted bootstrap sample of finite estimates with B retained resamples, the reported limits are

\tilde r_{pb,(\ell)} \quad \text{and} \quad \tilde r_{pb,(u)},

where \ell = \lfloor (\alpha/2) B + 0.5 \rfloor and u = \lfloor (1-\alpha/2) B + 0.5 \rfloor for \alpha = 1 - \mathrm{conf\_level}. Resamples that yield undefined estimates are discarded before the percentile limits are formed.

Computational complexity. In the complete-data path, forming the bent scores requires sorting within each column and the cross-product step costs O(n p^2) with O(p^2) output storage. When ci = TRUE, the bootstrap cost is incurred separately for each column pair.

Value

A symmetric correlation matrix with class pbcor and attributes method = "percentage_bend_correlation", description, and package = "matrixCorr". When ci = TRUE, the returned object also carries a ci attribute with elements est, lwr.ci, upr.ci, conf.level, and ci.method, plus attr(x, "conf.level"). When p_value = TRUE, it also carries an inference attribute with elements estimate, statistic, parameter, p_value, n_obs, and alternative. When either inferential option is requested, the object also carries diagnostics$n_complete.

Author(s)

Thiago de Paula Oliveira

References

Wilcox, R. R. (1994). The percentage bend correlation coefficient. Psychometrika, 59(4), 601-616. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/BF02294395")}

See Also

wincor(), skipped_corr(), bicor()

Examples

set.seed(10)
X <- matrix(rnorm(150 * 4), ncol = 4)
X[sample(length(X), 8)] <- X[sample(length(X), 8)] + 10

R <- pbcor(X)
print(R, digits = 2)
summary(R)
plot(R)

# Interactive viewing (requires shiny)
if (interactive() && requireNamespace("shiny", quietly = TRUE)) {
  view_corr_shiny(R)
}


matrixCorr documentation built on April 18, 2026, 5:06 p.m.