| kendall_tau | R Documentation |
Computes pairwise Kendall's tau correlations for numeric data using a high-performance 'C++' backend. Optional confidence intervals are available for matrix and data-frame input.
kendall_tau(
data,
y = NULL,
na_method = c("error", "pairwise"),
ci = FALSE,
conf_level = 0.95,
ci_method = c("fieller", "if_el", "brown_benedetti"),
n_threads = getOption("matrixCorr.threads", 1L),
output = c("matrix", "sparse", "edge_list"),
threshold = 0,
diag = TRUE,
...
)
## S3 method for class 'kendall_matrix'
print(
x,
digits = 4,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
ci_digits = 3,
show_ci = NULL,
...
)
## S3 method for class 'kendall_matrix'
plot(
x,
title = "Kendall's Tau correlation heatmap",
low_color = "indianred1",
high_color = "steelblue1",
mid_color = "white",
value_text_size = 4,
ci_text_size = 3,
show_value = TRUE,
...
)
## S3 method for class 'kendall_matrix'
summary(
object,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
ci_digits = 3,
show_ci = NULL,
...
)
## S3 method for class 'summary.kendall_matrix'
print(
x,
digits = NULL,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
show_ci = NULL,
...
)
data |
For matrix/data frame mode, a numeric matrix or a data frame with at least
two numeric columns. All non-numeric columns are excluded. For two-vector
mode, a numeric vector |
y |
Optional numeric vector |
na_method |
Character scalar controlling missing-data handling.
|
ci |
Logical (default |
conf_level |
Confidence level used when |
ci_method |
Confidence-interval engine used when |
n_threads |
Integer |
output |
Output representation for the computed estimates.
|
threshold |
Non-negative absolute-value filter for non-matrix outputs:
keep entries with |
diag |
Logical; whether to include diagonal entries in
|
... |
Additional arguments passed to |
x |
An object of class |
digits |
Integer; number of decimal places to print. |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
ci_digits |
Integer; digits for Kendall confidence limits in the pairwise summary. |
show_ci |
One of |
title |
Plot title. Default is |
low_color |
Color for the minimum tau value. Default is
|
high_color |
Color for the maximum tau value. Default is
|
mid_color |
Color for zero correlation. Default is |
value_text_size |
Font size for displaying correlation values. Default
is |
ci_text_size |
Text size for confidence intervals in the heatmap. |
show_value |
Logical; if |
object |
An object of class |
Kendall's tau is a rank-based measure of association between two variables.
For a dataset with n observations on variables X and Y,
let n_0 = n(n - 1)/2 be the number of unordered pairs, C the
number of concordant pairs, and D the number of discordant pairs.
Let T_x = \sum_g t_g (t_g - 1)/2 and T_y = \sum_h u_h (u_h - 1)/2
be the numbers of tied pairs within X and within Y, respectively,
where t_g and u_h are tie-group sizes in X and Y.
The tie-robust Kendall's tau-b is:
\tau_b = \frac{C - D}{\sqrt{(n_0 - T_x)\,(n_0 - T_y)}}.
When there are no ties (T_x = T_y = 0), this reduces to tau-a:
\tau_a = \frac{C - D}{n(n-1)/2}.
The function automatically handles ties. In degenerate cases where a
variable is constant (n_0 = T_x or n_0 = T_y), the tau-b
denominator is zero and the correlation is undefined (returned as NA
off the diagonal).
When na_method = "pairwise", each (i,j) estimate is recomputed
on the pairwise complete-case overlap of columns i and j.
Confidence intervals use the observed pairwise-complete Kendall estimate and
the same pairwise complete-case overlap.
With ci_method = "fieller", the interval is built on the Fisher-style
transformed scale z = \operatorname{atanh}(\hat\tau) using Fieller's
asymptotic standard error
\operatorname{SE}(z) = \sqrt{\frac{0.437}{n - 4}},
where n is the pairwise complete-case sample size. The interval is then
mapped back with tanh() and clipped to [-1, 1] for numerical
safety. This is the default Kendall CI and is intended to be the fast,
production-oriented choice.
With ci_method = "brown_benedetti", the interval is computed on the
Kendall tau scale using the Brown-Benedetti large-sample variance for
Kendall's tau-b. This path is tie-aware, remains on the original Kendall
scale, and is intended as a conventional asymptotic alternative when a
direct tau-scale interval is preferred.
With ci_method = "if_el", the interval is computed in 'C++' using an
influence-function empirical-likelihood construction built from the
linearised Kendall estimating equation. The lower and upper limits are found
by solving the empirical-likelihood ratio equation against the
\chi^2_1-cutoff implied by conf_level. This method is slower
than "fieller" and is intended for specialised inference.
Performance:
In the two-vector mode (y supplied), the C++ backend uses a
raw-double path with minimal overhead.
In the matrix/data-frame mode, the no-missing estimate-only path
uses the Knight (1966) O(n \log n) algorithm. Pairwise-complete
inference paths recompute each pair on its complete-case overlap; the
"brown_benedetti" interval adds tie-aware large-sample variance
calculations and "if_el" adds extra per-pair likelihood solving.
If y is NULL and data is a matrix/data frame: a
symmetric numeric matrix where entry (i, j) is the Kendall's tau
correlation between the i-th and j-th numeric columns. When
ci = TRUE, the object also carries a ci attribute with
elements est, lwr.ci, upr.ci, conf.level, and
ci.method. Pairwise complete-case sample sizes are stored in
attr(x, "diagnostics")$n_complete.
If y is provided (two-vector mode): a single numeric scalar,
the Kendall's tau correlation between data and y.
Invisibly returns the kendall_matrix object.
A ggplot object representing the heatmap.
Missing values are rejected when na_method = "error". Columns
with fewer than two usable observations are excluded. Confidence intervals
are not available in the two-vector interface.
Thiago de Paula Oliveira
Kendall, M. G. (1938). A New Measure of Rank Correlation. Biometrika, 30(1/2), 81-93.
Knight, W. R. (1966). A Computer Method for Calculating Kendall's Tau with Ungrouped Data. Journal of the American Statistical Association, 61(314), 436-439.
Fieller, E. C., Hartley, H. O., & Pearson, E. S. (1957). Tests for rank correlation coefficients. I. Biometrika, 44(3/4), 470-481.
Brown, M. B., & Benedetti, J. K. (1977). Sampling behavior of tests for correlation in two-way contingency tables. Journal of the American Statistical Association, 72(358), 309-315.
Huang, Z., & Qin, G. (2023). Influence function-based confidence intervals for the Kendall rank correlation coefficient. Computational Statistics, 38(2), 1041-1055.
Croux, C., & Dehon, C. (2010). Influence functions of the Spearman and Kendall correlation measures. Statistical Methods & Applications, 19, 497-515.
print.kendall_matrix, plot.kendall_matrix
# Basic usage with a matrix
mat <- cbind(a = rnorm(100), b = rnorm(100), c = rnorm(100))
kt <- kendall_tau(mat)
print(kt)
summary(kt)
plot(kt)
# Confidence intervals
kt_ci <- kendall_tau(mat[, 1:3], ci = TRUE)
print(kt_ci, show_ci = "yes")
summary(kt_ci)
# Two-vector mode (scalar path)
x <- rnorm(1000); y <- 0.5 * x + rnorm(1000)
kendall_tau(x, y)
# Including ties
tied_df <- data.frame(
v1 = rep(1:5, each = 20),
v2 = rep(5:1, each = 20),
v3 = rnorm(100)
)
kt_tied <- kendall_tau(tied_df, ci = TRUE, ci_method = "fieller")
print(kt_tied, show_ci = "yes")
# Interactive viewing (requires shiny)
if (interactive() && requireNamespace("shiny", quietly = TRUE)) {
view_corr_shiny(kt)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.