kendall_tau: Pairwise (or Two-Vector) Kendall's Tau Rank Correlation

View source: R/kendall_corr.R

kendall_tauR Documentation

Pairwise (or Two-Vector) Kendall's Tau Rank Correlation

Description

Computes pairwise Kendall's tau correlations for numeric data using a high-performance 'C++' backend. Optional confidence intervals are available for matrix and data-frame input.

Usage

kendall_tau(
  data,
  y = NULL,
  na_method = c("error", "pairwise"),
  ci = FALSE,
  conf_level = 0.95,
  ci_method = c("fieller", "if_el", "brown_benedetti"),
  n_threads = getOption("matrixCorr.threads", 1L),
  output = c("matrix", "sparse", "edge_list"),
  threshold = 0,
  diag = TRUE,
  ...
)

## S3 method for class 'kendall_matrix'
print(
  x,
  digits = 4,
  n = NULL,
  topn = NULL,
  max_vars = NULL,
  width = NULL,
  ci_digits = 3,
  show_ci = NULL,
  ...
)

## S3 method for class 'kendall_matrix'
plot(
  x,
  title = "Kendall's Tau correlation heatmap",
  low_color = "indianred1",
  high_color = "steelblue1",
  mid_color = "white",
  value_text_size = 4,
  ci_text_size = 3,
  show_value = TRUE,
  ...
)

## S3 method for class 'kendall_matrix'
summary(
  object,
  n = NULL,
  topn = NULL,
  max_vars = NULL,
  width = NULL,
  ci_digits = 3,
  show_ci = NULL,
  ...
)

## S3 method for class 'summary.kendall_matrix'
print(
  x,
  digits = NULL,
  n = NULL,
  topn = NULL,
  max_vars = NULL,
  width = NULL,
  show_ci = NULL,
  ...
)

Arguments

data

For matrix/data frame mode, a numeric matrix or a data frame with at least two numeric columns. All non-numeric columns are excluded. For two-vector mode, a numeric vector x.

y

Optional numeric vector y of the same length as data when data is a vector. If supplied, the function computes the Kendall correlation between data and y using a low-overhead scalar path and returns a single number.

na_method

Character scalar controlling missing-data handling. "error" rejects missing, NaN, and infinite values. "pairwise" recomputes each correlation on its own pairwise complete-case overlap.

ci

Logical (default FALSE). If TRUE, attach pairwise confidence intervals for the off-diagonal Kendall correlations in matrix/data-frame mode.

conf_level

Confidence level used when ci = TRUE. Default is 0.95.

ci_method

Confidence-interval engine used when ci = TRUE. Supported Kendall methods are "fieller" (default), "brown_benedetti", and "if_el".

n_threads

Integer \geq 1. Number of OpenMP threads. Defaults to getOption("matrixCorr.threads", 1L).

output

Output representation for the computed estimates.

  • "matrix" (default): full dense matrix; best when you need matrix algebra, dense heatmaps, or full compatibility with existing code.

  • "sparse": sparse matrix from Matrix containing only retained entries; best when many values are dropped by thresholding.

  • "edge_list": long-form data frame with columns row, col, value; convenient for filtering, joins, and network-style workflows.

threshold

Non-negative absolute-value filter for non-matrix outputs: keep entries with abs(value) >= threshold. Use threshold > 0 when you want only stronger associations (typically with output = "sparse" or "edge_list"). Keep threshold = 0 to retain all values. Must be 0 when output = "matrix".

diag

Logical; whether to include diagonal entries in "sparse" and "edge_list" outputs.

...

Additional arguments passed to ggplot2::theme() or other ggplot2 layers.

x

An object of class summary.kendall_matrix.

digits

Integer; number of decimal places to print.

n

Optional row threshold for compact preview output.

topn

Optional number of leading/trailing rows to show when truncated.

max_vars

Optional maximum number of visible columns; NULL derives this from console width.

width

Optional display width; defaults to getOption("width").

ci_digits

Integer; digits for Kendall confidence limits in the pairwise summary.

show_ci

One of "yes" or "no".

title

Plot title. Default is "Kendall's Tau correlation heatmap".

low_color

Color for the minimum tau value. Default is "indianred1".

high_color

Color for the maximum tau value. Default is "steelblue1".

mid_color

Color for zero correlation. Default is "white".

value_text_size

Font size for displaying correlation values. Default is 4.

ci_text_size

Text size for confidence intervals in the heatmap.

show_value

Logical; if TRUE (default), overlay numeric values on the heatmap tiles.

object

An object of class kendall_matrix.

Details

Kendall's tau is a rank-based measure of association between two variables. For a dataset with n observations on variables X and Y, let n_0 = n(n - 1)/2 be the number of unordered pairs, C the number of concordant pairs, and D the number of discordant pairs. Let T_x = \sum_g t_g (t_g - 1)/2 and T_y = \sum_h u_h (u_h - 1)/2 be the numbers of tied pairs within X and within Y, respectively, where t_g and u_h are tie-group sizes in X and Y.

The tie-robust Kendall's tau-b is:

\tau_b = \frac{C - D}{\sqrt{(n_0 - T_x)\,(n_0 - T_y)}}.

When there are no ties (T_x = T_y = 0), this reduces to tau-a:

\tau_a = \frac{C - D}{n(n-1)/2}.

The function automatically handles ties. In degenerate cases where a variable is constant (n_0 = T_x or n_0 = T_y), the tau-b denominator is zero and the correlation is undefined (returned as NA off the diagonal).

When na_method = "pairwise", each (i,j) estimate is recomputed on the pairwise complete-case overlap of columns i and j. Confidence intervals use the observed pairwise-complete Kendall estimate and the same pairwise complete-case overlap.

With ci_method = "fieller", the interval is built on the Fisher-style transformed scale z = \operatorname{atanh}(\hat\tau) using Fieller's asymptotic standard error

\operatorname{SE}(z) = \sqrt{\frac{0.437}{n - 4}},

where n is the pairwise complete-case sample size. The interval is then mapped back with tanh() and clipped to [-1, 1] for numerical safety. This is the default Kendall CI and is intended to be the fast, production-oriented choice.

With ci_method = "brown_benedetti", the interval is computed on the Kendall tau scale using the Brown-Benedetti large-sample variance for Kendall's tau-b. This path is tie-aware, remains on the original Kendall scale, and is intended as a conventional asymptotic alternative when a direct tau-scale interval is preferred.

With ci_method = "if_el", the interval is computed in 'C++' using an influence-function empirical-likelihood construction built from the linearised Kendall estimating equation. The lower and upper limits are found by solving the empirical-likelihood ratio equation against the \chi^2_1-cutoff implied by conf_level. This method is slower than "fieller" and is intended for specialised inference.

Performance:

  • In the two-vector mode (y supplied), the C++ backend uses a raw-double path with minimal overhead.

  • In the matrix/data-frame mode, the no-missing estimate-only path uses the Knight (1966) O(n \log n) algorithm. Pairwise-complete inference paths recompute each pair on its complete-case overlap; the "brown_benedetti" interval adds tie-aware large-sample variance calculations and "if_el" adds extra per-pair likelihood solving.

Value

  • If y is NULL and data is a matrix/data frame: a symmetric numeric matrix where entry (i, j) is the Kendall's tau correlation between the i-th and j-th numeric columns. When ci = TRUE, the object also carries a ci attribute with elements est, lwr.ci, upr.ci, conf.level, and ci.method. Pairwise complete-case sample sizes are stored in attr(x, "diagnostics")$n_complete.

  • If y is provided (two-vector mode): a single numeric scalar, the Kendall's tau correlation between data and y.

Invisibly returns the kendall_matrix object.

A ggplot object representing the heatmap.

Note

Missing values are rejected when na_method = "error". Columns with fewer than two usable observations are excluded. Confidence intervals are not available in the two-vector interface.

Author(s)

Thiago de Paula Oliveira

References

Kendall, M. G. (1938). A New Measure of Rank Correlation. Biometrika, 30(1/2), 81-93.

Knight, W. R. (1966). A Computer Method for Calculating Kendall's Tau with Ungrouped Data. Journal of the American Statistical Association, 61(314), 436-439.

Fieller, E. C., Hartley, H. O., & Pearson, E. S. (1957). Tests for rank correlation coefficients. I. Biometrika, 44(3/4), 470-481.

Brown, M. B., & Benedetti, J. K. (1977). Sampling behavior of tests for correlation in two-way contingency tables. Journal of the American Statistical Association, 72(358), 309-315.

Huang, Z., & Qin, G. (2023). Influence function-based confidence intervals for the Kendall rank correlation coefficient. Computational Statistics, 38(2), 1041-1055.

Croux, C., & Dehon, C. (2010). Influence functions of the Spearman and Kendall correlation measures. Statistical Methods & Applications, 19, 497-515.

See Also

print.kendall_matrix, plot.kendall_matrix

Examples

# Basic usage with a matrix
mat <- cbind(a = rnorm(100), b = rnorm(100), c = rnorm(100))
kt <- kendall_tau(mat)
print(kt)
summary(kt)
plot(kt)

# Confidence intervals
kt_ci <- kendall_tau(mat[, 1:3], ci = TRUE)
print(kt_ci, show_ci = "yes")
summary(kt_ci)

# Two-vector mode (scalar path)
x <- rnorm(1000); y <- 0.5 * x + rnorm(1000)
kendall_tau(x, y)

# Including ties
tied_df <- data.frame(
  v1 = rep(1:5, each = 20),
  v2 = rep(5:1, each = 20),
  v3 = rnorm(100)
)
kt_tied <- kendall_tau(tied_df, ci = TRUE, ci_method = "fieller")
print(kt_tied, show_ci = "yes")

# Interactive viewing (requires shiny)
if (interactive() && requireNamespace("shiny", quietly = TRUE)) {
  view_corr_shiny(kt)
}


matrixCorr documentation built on April 18, 2026, 5:06 p.m.