kendall_tau: Pairwise Kendall's Tau Rank Correlation

View source: R/kendall_corr.R

kendall_tauR Documentation

Pairwise Kendall's Tau Rank Correlation

Description

Computes all pairwise Kendall's tau rank correlation coefficients for the numeric columns of a matrix or data frame using a high-performance 'C++'.

This function uses a fast and scalable algorithm implemented in 'C++' to compute both Kendall's tau-a (when no ties are present) and tau-b (when ties are detected), making it suitable for large datasets. Internally, it calls a highly optimized function that uses a combination of merge-sort- based inversion counting and a Fenwick Tree (binary indexed tree) for efficient tie handling.

Prints a summary of the Kendall's tau correlation matrix, including description and method metadata.

Generates a ggplot2-based heatmap of the Kendall's tau correlation matrix.

Usage

kendall_tau(data)

## S3 method for class 'kendall_matrix'
print(x, digits = 4, max_rows = NULL, max_cols = NULL, ...)

## S3 method for class 'kendall_matrix'
plot(
  x,
  title = "Kendall's Tau correlation heatmap",
  low_color = "indianred1",
  high_color = "steelblue1",
  mid_color = "white",
  value_text_size = 4,
  ...
)

Arguments

data

A numeric matrix or a data frame with at least two numeric columns. All non-numeric columns will be excluded. Each column must have at least two non-missing values and contain no NAs.

x

An object of class kendall_matrix.

digits

Integer; number of decimal places to print

max_rows

Optional integer; maximum number of rows to display. If NULL, all rows are shown.

max_cols

Optional integer; maximum number of columns to display. If NULL, all columns are shown.

...

Additional arguments passed to ggplot2::theme() or other ggplot2 layers.

title

Plot title. Default is "Kendall's Tau Correlation Heatmap".

low_color

Color for the minimum tau value. Default is "indianred1".

high_color

Color for the maximum tau value. Default is "steelblue1".

mid_color

Color for zero correlation. Default is "white".

value_text_size

Font size for displaying correlation values. Default is 4.

Details

Kendall's tau is a rank-based measure of association between two variables. For a dataset with n observations of two variables X and Y, Kendall's tau coefficient is defined as:

\tau = \frac{C - D}{\sqrt{(C + D + T_x)(C + D + T_y)}}

where:

  • C is the number of concordant pairs defined by (x_i - x_j)(y_i - y_j) > 0

  • D is the number of discordant pairs defined by (x_i - x_j)(y_i - y_j) < 0

  • T_x, T_y are the number of tied pairs in X and Y, respectively

When there are no ties, the function computes the faster tau-a version:

\tau_a = \frac{C - D}{n(n-1)/2}

The function automatically selects tau-a or tau-b depending on the presence of ties. Performance is maximized by computing correlations in 'C++' directly from the matrix columns.

Value

A symmetric numeric matrix where the (i, j)-th element is the Kendall's tau correlation between the i-th and j-th numeric columns of the input.

Invisibly returns the kendall_matrix object.

A ggplot object representing the heatmap.

Note

Missing values are not allowed. Columns with fewer than two observations are excluded.

Author(s)

Thiago de Paula Oliveira toliveira@abacusbio.com

Thiago de Paula Oliveira

References

Kendall, M. G. (1938). A New Measure of Rank Correlation. Biometrika, 30(1/2), 81–93.

See Also

print.kendall_matrix, print.kendall_matrix

Examples

# Basic usage with a matrix
mat <- cbind(a = rnorm(100), b = rnorm(100), c = rnorm(100))
kt <- kendall_tau(mat)
print(kt)
plot(kt)

# With a large data frame
df <- data.frame(x = rnorm(1e4), y = rnorm(1e4), z = rnorm(1e4))
kendall_tau(df)

# Including ties
tied_df <- data.frame(
  v1 = rep(1:5, each = 20),
  v2 = rep(5:1, each = 20),
  v3 = rnorm(100)
)
kt <- kendall_tau(tied_df)
print(kt)
plot(kt)


matrixCorr documentation built on Aug. 26, 2025, 5:07 p.m.