pbcor: Percentage bend correlation
In matrixCorr: Collection of Correlation and Association Estimators

pbcor

R Documentation

Percentage bend correlation

Description

Computes all pairwise percentage bend correlations for the numeric columns of a matrix or data frame. Percentage bend correlation limits the influence of extreme marginal observations by bending standardised deviations into the interval [-1, 1], yielding a Pearson-like measure that is robust to outliers and heavy tails.

Usage

pbcor(
  data,
  na_method = c("error", "pairwise"),
  ci = FALSE,
  p_value = FALSE,
  conf_level = 0.95,
  n_threads = getOption("matrixCorr.threads", 1L),
  beta = 0.2,
  n_boot = 500L,
  seed = NULL,
  output = c("matrix", "sparse", "edge_list"),
  threshold = 0,
  diag = TRUE
)

## S3 method for class 'pbcor'
print(
  x,
  digits = 4,
  n = NULL,
  topn = NULL,
  max_vars = NULL,
  width = NULL,
  show_ci = NULL,
  ...
)

## S3 method for class 'pbcor'
plot(
  x,
  title = "Percentage bend correlation heatmap",
  low_color = "indianred1",
  high_color = "steelblue1",
  mid_color = "white",
  value_text_size = 4,
  show_value = TRUE,
  ...
)

## S3 method for class 'pbcor'
summary(
  object,
  n = NULL,
  topn = NULL,
  max_vars = NULL,
  width = NULL,
  ci_digits = 3,
  p_digits = 4,
  show_ci = NULL,
  ...
)

## S3 method for class 'summary.pbcor'
print(
  x,
  digits = NULL,
  n = NULL,
  topn = NULL,
  max_vars = NULL,
  width = NULL,
  show_ci = NULL,
  ...
)

Arguments

`data`	A numeric matrix or data frame containing numeric columns.
`na_method`	One of `"error"` (default) or `"pairwise"`. With `"pairwise"`, each correlation is computed on the overlapping complete rows for the column pair.
`ci`	Logical (default `FALSE`). If `TRUE`, attach percentile bootstrap confidence intervals for each pairwise estimate.
`p_value`	Logical (default `FALSE`). If `TRUE`, attach the method-specific large-sample test statistic and two-sided p-value for each pairwise estimate.
`conf_level`	Confidence level used when `ci = TRUE`. Default `0.95`.
`n_threads`	Integer `\geq 1`. Number of OpenMP threads used for the point-estimate matrix computation. Defaults to `getOption("matrixCorr.threads", 1L)`.
`beta`	Bending constant in `[0, 0.5)` that sets the cutoff used to bend standardised deviations toward the interval `[-1, 1]`. Larger values cause more observations to be bent and increase resistance to marginal outliers. Default `0.2`. See Details.
`n_boot`	Integer `\geq 1`. Number of bootstrap resamples used when `ci = TRUE`. Default `500`.
`seed`	Optional positive integer used to seed the bootstrap resampling when `ci = TRUE`. If `NULL`, the current random-number stream is used.
`output`	Output representation for the computed estimates. `"matrix"` (default): full dense matrix; best when you need matrix algebra, dense heatmaps, or full compatibility with existing code. `"sparse"`: sparse matrix from Matrix containing only retained entries; best when many values are dropped by thresholding. `"edge_list"`: long-form data frame with columns `row`, `col`, `value`; convenient for filtering, joins, and network-style workflows.
`threshold`	Non-negative absolute-value filter for non-matrix outputs: keep entries with `abs(value) >= threshold`. Use `threshold > 0` when you want only stronger associations (typically with `output = "sparse"` or `"edge_list"`). Keep `threshold = 0` to retain all values. Must be `0` when `output = "matrix"`.
`diag`	Logical; whether to include diagonal entries in `"sparse"` and `"edge_list"` outputs.
`x`	An object of class `summary.pbcor`.
`digits`	Integer; number of digits to print.
`n`	Optional row threshold for compact preview output.
`topn`	Optional number of leading/trailing rows to show when truncated.
`max_vars`	Optional maximum number of visible columns; `NULL` derives this from console width.
`width`	Optional display width; defaults to `getOption("width")`.
`show_ci`	One of `"yes"` or `"no"`.
`...`	Additional arguments passed to the underlying print or plot helper.
`title`	Character; plot title.
`low_color`, `high_color`, `mid_color`	Colors used in the heatmap.
`value_text_size`	Numeric text size for overlaid cell values.
`show_value`	Logical; if `TRUE` (default), overlay numeric values on the heatmap tiles.
`object`	An object of class `pbcor`.
`ci_digits`	Integer; digits used for confidence limits in pairwise summaries.
`p_digits`	Integer; digits used for p-values in pairwise summaries.

Details

Let X \in \mathbb{R}^{n \times p} be a numeric matrix with rows as observations and columns as variables. For a column x = (x_i)_{i=1}^n, let m_x = \mathrm{med}(x) and define \omega_\beta(x) as the \lfloor (1-\beta)n \rfloor-th order statistic of |x_i - m_x|. Larger values of beta reduce \omega_\beta(x), so more observations are bent to the bounds -1 and 1.

The one-step percentage-bend location is

\hat\theta_{pb}(x) = \frac{\sum_{i: |\psi_i| \le 1} x_i + \omega_\beta(x)(i_2 - i_1)} {n - i_1 - i_2}, \qquad \psi_i = \frac{x_i - m_x}{\omega_\beta(x)},

where i_1 = \sum_{i=1}^n \mathbf{1}(\psi_i < -1) and i_2 = \sum_{i=1}^n \mathbf{1}(\psi_i > 1). The bent scores are

a_i = \max\!\left\{-1,\; \min\!\left(1,\frac{x_i - \hat\theta_{pb}(x)} {\omega_\beta(x)}\right)\right\},

and likewise b_i for a second column y. The percentage bend correlation for the pair (x,y) is

r_{pb}(x,y) = \frac{\sum_{i=1}^n a_i b_i} {\sqrt{\sum_{i=1}^n a_i^2}\sqrt{\sum_{i=1}^n b_i^2}}.

In the complete-data path, the bent score vectors are computed once per column and collected into a matrix A = [a_{\cdot 1}, \ldots, a_{\cdot p}], after which the correlation matrix is formed from their cross-products:

R_{pb} = D_A^{-1/2} A^\top A D_A^{-1/2}, \qquad D_A = \mathrm{diag}(a_{\cdot 1}^\top a_{\cdot 1}, \ldots, a_{\cdot p}^\top a_{\cdot p}).

If a column yields an undefined bent-score denominator, the corresponding row and column are returned as NA. With na_method = "pairwise", each pair is recomputed on its complete-case overlap. As with pairwise Pearson correlation, this pairwise path can break positive semidefiniteness.

When p_value = TRUE, the method-specific test statistic for a pairwise estimate r_{pb} based on n_{ij} complete observations is

T_{ij} = r_{pb,ij}\sqrt{\frac{n_{ij} - 2}{1 - r_{pb,ij}^2}},

and the reported p-value is the two-sided Student-t tail probability with n_{ij}-2 degrees of freedom. When ci = TRUE, the interval is a percentile bootstrap interval based on n_{\mathrm{boot}} resamples drawn from the pairwise complete cases. If \tilde r_{pb,(1)} \le \cdots \le \tilde r_{pb,(B)} denotes the sorted bootstrap sample of finite estimates with B retained resamples, the reported limits are

\tilde r_{pb,(\ell)} \quad \text{and} \quad \tilde r_{pb,(u)},

where \ell = \lfloor (\alpha/2) B + 0.5 \rfloor and u = \lfloor (1-\alpha/2) B + 0.5 \rfloor for \alpha = 1 - \mathrm{conf\_level}. Resamples that yield undefined estimates are discarded before the percentile limits are formed.

Computational complexity. In the complete-data path, forming the bent scores requires sorting within each column and the cross-product step costs O(n p^2) with O(p^2) output storage. When ci = TRUE, the bootstrap cost is incurred separately for each column pair.

Value

A symmetric correlation matrix with class pbcor and attributes method = "percentage_bend_correlation", description, and package = "matrixCorr". When ci = TRUE, the returned object also carries a ci attribute with elements est, lwr.ci, upr.ci, conf.level, and ci.method, plus attr(x, "conf.level"). When p_value = TRUE, it also carries an inference attribute with elements estimate, statistic, parameter, p_value, n_obs, and alternative. When either inferential option is requested, the object also carries diagnostics$n_complete.

Author(s)

Thiago de Paula Oliveira

References

Wilcox, R. R. (1994). The percentage bend correlation coefficient. Psychometrika, 59(4), 601-616. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/BF02294395")}

Examples

set.seed(10)
X <- matrix(rnorm(150 * 4), ncol = 4)
X[sample(length(X), 8)] <- X[sample(length(X), 8)] + 10

R <- pbcor(X)
print(R, digits = 2)
summary(R)
plot(R)

# Interactive viewing (requires shiny)
if (interactive() && requireNamespace("shiny", quietly = TRUE)) {
  view_corr_shiny(R)
}

matrixCorr documentation built on April 18, 2026, 5:06 p.m.