kendall_tau: Pairwise Kendall's Tau Rank Correlation
In matrixCorr: Collection of Correlation and Association Estimators

View source: R/kendall_corr.R

kendall_tau

R Documentation

Pairwise Kendall's Tau Rank Correlation

Description

Computes all pairwise Kendall's tau rank correlation coefficients for the numeric columns of a matrix or data frame using a high-performance 'C++'.

This function uses a fast and scalable algorithm implemented in 'C++' to compute both Kendall's tau-a (when no ties are present) and tau-b (when ties are detected), making it suitable for large datasets. Internally, it calls a highly optimized function that uses a combination of merge-sort- based inversion counting and a Fenwick Tree (binary indexed tree) for efficient tie handling.

Prints a summary of the Kendall's tau correlation matrix, including description and method metadata.

Generates a ggplot2-based heatmap of the Kendall's tau correlation matrix.

Usage

kendall_tau(data)

## S3 method for class 'kendall_matrix'
print(x, digits = 4, max_rows = NULL, max_cols = NULL, ...)

## S3 method for class 'kendall_matrix'
plot(
  x,
  title = "Kendall's Tau correlation heatmap",
  low_color = "indianred1",
  high_color = "steelblue1",
  mid_color = "white",
  value_text_size = 4,
  ...
)

Arguments

`data`	A numeric matrix or a data frame with at least two numeric columns. All non-numeric columns will be excluded. Each column must have at least two non-missing values and contain no NAs.
`x`	An object of class `kendall_matrix`.
`digits`	Integer; number of decimal places to print
`max_rows`	Optional integer; maximum number of rows to display. If `NULL`, all rows are shown.
`max_cols`	Optional integer; maximum number of columns to display. If `NULL`, all columns are shown.
`...`	Additional arguments passed to `ggplot2::theme()` or other `ggplot2` layers.
`title`	Plot title. Default is `"Kendall's Tau Correlation Heatmap"`.
`low_color`	Color for the minimum tau value. Default is `"indianred1"`.
`high_color`	Color for the maximum tau value. Default is `"steelblue1"`.
`mid_color`	Color for zero correlation. Default is `"white"`.
`value_text_size`	Font size for displaying correlation values. Default is `4`.

Details

Kendall's tau is a rank-based measure of association between two variables. For a dataset with n observations of two variables X and Y, Kendall's tau coefficient is defined as:

\tau = \frac{C - D}{\sqrt{(C + D + T_x)(C + D + T_y)}}

where:

C is the number of concordant pairs defined by (x_i - x_j)(y_i - y_j) > 0
D is the number of discordant pairs defined by (x_i - x_j)(y_i - y_j) < 0
T_x, T_y are the number of tied pairs in X and Y, respectively

When there are no ties, the function computes the faster tau-a version:

\tau_a = \frac{C - D}{n(n-1)/2}

The function automatically selects tau-a or tau-b depending on the presence of ties. Performance is maximized by computing correlations in 'C++' directly from the matrix columns.

Value

A symmetric numeric matrix where the (i, j)-th element is the Kendall's tau correlation between the i-th and j-th numeric columns of the input.

Invisibly returns the kendall_matrix object.

A ggplot object representing the heatmap.

Note

Missing values are not allowed. Columns with fewer than two observations are excluded.

Author(s)

Thiago de Paula Oliveira toliveira@abacusbio.com

Thiago de Paula Oliveira

References

Kendall, M. G. (1938). A New Measure of Rank Correlation. Biometrika, 30(1/2), 81–93.

Examples

# Basic usage with a matrix
mat <- cbind(a = rnorm(100), b = rnorm(100), c = rnorm(100))
kt <- kendall_tau(mat)
print(kt)
plot(kt)

# With a large data frame
df <- data.frame(x = rnorm(1e4), y = rnorm(1e4), z = rnorm(1e4))
kendall_tau(df)

# Including ties
tied_df <- data.frame(
  v1 = rep(1:5, each = 20),
  v2 = rep(5:1, each = 20),
  v3 = rnorm(100)
)
kt <- kendall_tau(tied_df)
print(kt)
plot(kt)

matrixCorr documentation built on Aug. 26, 2025, 5:07 p.m.