kendall_tau | R Documentation |
Computes all pairwise Kendall's tau rank correlation coefficients for the numeric columns of a matrix or data frame using a high-performance 'C++'.
This function uses a fast and scalable algorithm implemented in 'C++' to compute both Kendall's tau-a (when no ties are present) and tau-b (when ties are detected), making it suitable for large datasets. Internally, it calls a highly optimized function that uses a combination of merge-sort- based inversion counting and a Fenwick Tree (binary indexed tree) for efficient tie handling.
Prints a summary of the Kendall's tau correlation matrix, including description and method metadata.
Generates a ggplot2-based heatmap of the Kendall's tau correlation matrix.
kendall_tau(data)
## S3 method for class 'kendall_matrix'
print(x, digits = 4, max_rows = NULL, max_cols = NULL, ...)
## S3 method for class 'kendall_matrix'
plot(
x,
title = "Kendall's Tau correlation heatmap",
low_color = "indianred1",
high_color = "steelblue1",
mid_color = "white",
value_text_size = 4,
...
)
data |
A numeric matrix or a data frame with at least two numeric columns. All non-numeric columns will be excluded. Each column must have at least two non-missing values and contain no NAs. |
x |
An object of class |
digits |
Integer; number of decimal places to print |
max_rows |
Optional integer; maximum number of rows to display.
If |
max_cols |
Optional integer; maximum number of columns to display.
If |
... |
Additional arguments passed to |
title |
Plot title. Default is |
low_color |
Color for the minimum tau value. Default is
|
high_color |
Color for the maximum tau value. Default is
|
mid_color |
Color for zero correlation. Default is |
value_text_size |
Font size for displaying correlation values. Default
is |
Kendall's tau is a rank-based measure of association between two variables.
For a dataset with n
observations of two variables X
and
Y
, Kendall's tau coefficient is defined as:
\tau = \frac{C - D}{\sqrt{(C + D + T_x)(C + D + T_y)}}
where:
C
is the number of concordant pairs defined by
(x_i - x_j)(y_i - y_j) > 0
D
is the number of discordant pairs defined by
(x_i - x_j)(y_i - y_j) < 0
T_x
, T_y
are the number of tied pairs in X
and
Y
, respectively
When there are no ties, the function computes the faster tau-a version:
\tau_a = \frac{C - D}{n(n-1)/2}
The function automatically selects tau-a or tau-b depending on the presence of ties. Performance is maximized by computing correlations in 'C++' directly from the matrix columns.
A symmetric numeric matrix where the (i, j)
-th element is
the Kendall's tau correlation between the i
-th and j
-th
numeric columns of the input.
Invisibly returns the kendall_matrix
object.
A ggplot
object representing the heatmap.
Missing values are not allowed. Columns with fewer than two observations are excluded.
Thiago de Paula Oliveira toliveira@abacusbio.com
Thiago de Paula Oliveira
Kendall, M. G. (1938). A New Measure of Rank Correlation. Biometrika, 30(1/2), 81–93.
print.kendall_matrix
,
print.kendall_matrix
# Basic usage with a matrix
mat <- cbind(a = rnorm(100), b = rnorm(100), c = rnorm(100))
kt <- kendall_tau(mat)
print(kt)
plot(kt)
# With a large data frame
df <- data.frame(x = rnorm(1e4), y = rnorm(1e4), z = rnorm(1e4))
kendall_tau(df)
# Including ties
tied_df <- data.frame(
v1 = rep(1:5, each = 20),
v2 = rep(5:1, each = 20),
v3 = rnorm(100)
)
kt <- kendall_tau(tied_df)
print(kt)
plot(kt)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.