knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(kendallknight)
For the full details about the implementation, please check Sepulveda MV (2025) Kendallknight: An R package for efficient implementation of Kendall's correlation coefficient computation. PLoS One 20(6): e0326090. https://doi.org/10.1371/journal.pone.0326090
The kendallknight
package is exclusively focused on the Kendall's correlation
coefficient and provides additional functions to test the statistical
significance of the computed correlation not available in other packages, which
is particularly useful in econometric and statistical contexts.
The kendallknight
package is available on CRAN and can be installed using the
following command:
# CRAN install.packages("kendallknight") # GitHub remotes::install_github("pachadotdev/kendallknight")
As an illustrative exercise we can explore the question 'is there a relationship between the number of computer science doctorates awarded in the United States and the total revenue generated by arcades?' Certainly, this question is about a numerical exercise and not about causal mechanisms.
The following table obtained from @vigen2015 can be used to illustrate the usage of
the kendallknight
package:
| Year| Computer science doctorates awarded in the US| Total revenue generated by arcades| |----:|---------------------------------------------:|----------------------------------:| | 2000| 861| 1.196| | 2001| 830| 1.176| | 2002| 809| 1.269| | 2003| 867| 1.240| | 2004| 948| 1.307| | 2005| 1129| 1.435| | 2006| 1453| 1.601| | 2007| 1656| 1.654| | 2008| 1787| 1.803| | 2009| 1611| 1.734|
The kendall_cor()
function can be used to compute the Kendall's correlation
coefficient:
library(kendallknight) kendall_cor(arcade$doctorates, arcade$revenue) [1] 0.8222222
The kendall_cor_test()
function can be used to test the null hypothesis that
the Kendall's correlation coefficient is zero:
kendall_cor_test( arcade$doctorates, arcade$revenue, conf.level = 0.8, alternative = "greater" ) Kendall's rank correlation tau data: arcade$doctorates and arcade$revenue tau = 0.82222, p-value = 0.0001788 alternative hypothesis: true tau is greater than 0 80 percent confidence interval: 0.5038182 1.0000000
One important difference with base R implementation is that this implementation allows to obtain confidence intervals for different confidence levels (e.g., 95\%, 90\%, etc).
With the obtained $p$-value and a significance level of 80\% (the default is 95\%), the null hypothesis is rejected for the two-tailed test ($H_0: \tau = 0$ versus $H_1: \neq 0$, the default option) and the greater than one-tailed test ($H_0: \tau = 0$ versus $H_1: \tau > 0$) but not for the lower than one-tailed test ($H_0: \tau = 0$ versus $H_1: \tau < 0$). This suggests the correlation is positive (e.g., more doctorates are associated with more revenue generated by arcades). In other words, these three tests tell us that the empirical evidence from this dataset provides three answers to the research questions:
With base R or Kendall
, an equivalent result can be obtained with the
following code:
cor.test(arcade$doctorates, arcade$revenue, method = "kendall") Kendall's rank correlation tau data: arcade$doctorates and arcade$revenue T = 41, p-value = 0.0003577 alternative hypothesis: true tau is not equal to 0 sample estimates: tau 0.8222222
Kendall::Kendall(arcade$doctorates, arcade$revenue) tau = 0.822, 2-sided pvalue =0.0012822
In an Econometric context, the current implementation is particularly useful to
compute the pseudo-$R^2$ statistic defined as the squared Kendall correlation in
the context of (Quasi) Poisson regression with fixed effects
[@santos;@sepulveda]. A local test reveals how the pseudo-$R^2$ computation
time drops from fifty to one percent of the time
required to compute the model coefficients by using the fepois()
function from
the lfe
package [@gaure] and a dataset containing fifteen thousand rows
library(tradepolicy) library(lfe) data8694 <- subset(agtpa_applications, year %in% seq(1986, 1994, 4)) fit <- fepois( trade ~ dist + cntg + lang + clny + rta | as.factor(paste0(exporter, year)) + as.factor(paste0(importer, year)), data = data8694 ) psr <- (cor(data8694$trade, fit$fitted.values, method = "kendall"))^2 psr2 <- (kendall_cor(data8694$trade, fit$fitted.values))^2 c("kendallknight" = psr, "base R" = psr2) kendallknight base R 0.263012 0.263012
Comparing the model fitting time versus the correlation computation time,
we can see that the kendallknight
package becomes relevant:
|Operation | Time| Pseudo $R^2$ / Model fitting| |:----------------------------|-----:|----------------------------:| |Model fitting | 3.75s| | |Pseudo-$R^2$ (base R) | 1.78s| 47.58%| |Pseudo-$R^2$ (kendallknight) | 0.02s| 0.51%|
For simulated datasets with variables "x" and "y" created with rnorm()
and
rpois()
we observe a increasingly time differences as the number of
observations increases:
| No. of observations | kendallknight median time (s) | Kendall median time (s) | Base R median time (s) | |:-------------------:|:-----------------------------:|:-----------------------:|:----------------------:| | 10,000 | 0.013 | 1.0 | 4 | | 20,000 | 0.026 | 3.9 | 16 | | 30,000 | 0.040 | 8.7 | 36 | | 40,000 | 0.056 | 15.6 | 64 | | 50,000 | 0.071 | 24.2 | 100 | | 60,000 | 0.088 | 34.8 | 144 | | 70,000 | 0.104 | 47.5 | 196 | | 80,000 | 0.123 | 61.9 | 256 | | 90,000 | 0.137 | 78.2 | 324 | | 100,000 | 0.153 | 96.4 | 399 |
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.