  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"


The goal of pairedCAC is to allow you to test the difference between 2 chance-corrected agreement coefficients for statistical significance.



You can install the most current version of pairedCAC from the GitHub repositure as follows:



The pairedCAC is an R package that provides a series of functions for testing the difference between 2 correlated chance-corrected agreement coefficients for statistical significance. This package closely follows the general framework discussed by Gwet (2016) and expanded more recently by Gwet (2021).

Typically, 2 agreement coefficients are correlated when they are based on 2 overlapping samples of subjects or 2 overlapping rosters of raters. For uncorrelated coefficients, the testing is straightforward and discussed in Gwet(2021, section 9.3).


Suppose that Fleiss' generalized coefficient is calculated on 2 occasions using the 2 datasets of ratings ratings1 and ratings2 included in this package. How do you determine whether the difference between these 2 Fleiss' kappa coefficients is statistically significant? Proceed as follows:

fleiss <- ttest.fleiss(ratings1,ratings2)

The function ttest.fleiss() returns a 2-element list, the first of which is a data frame named test that you can display with the expression fleiss$test. This data frame contains the test statistics shown above.

Fleiss' generalized kappa coefficients associated with the datasets 1 and 2 are respectively given by fleiss.coeff1 =r round(fleiss$test$fleiss.coeff1,3) and fleiss.coeff2 = r round(fleiss$test$fleiss.coeff2,3). The difference between the second and first coefficient that is to be tested for statistical significance is given by coeff.diff = r round(fleiss$test$coeff.diff,3) and its standard error is std.err = r round(fleiss$test$std.err,3).

Now, a key element for testing the statistical significance of the difference is the Test Statistic t.stat = r round(fleiss$test$t.stat,3) that you must compare to the $97.5$-th of the Standard Normal distribution, which is r round(qnorm(0.975),4). Since the test statistic exceeds this threshold, you may conclude that the difference is statistically significant.

The ttest.fleiss() function also outputs the p-value associated with the difference between the 2 Fleiss' kappa coefficients. In our example, it is given by p-value = r round(fleiss$test$p.value,4), which is smaller than the standard threshold of 0.05. This is an indication that the difference is statistically significant.

Here are the 2 data frames ratings1 and ratings2 and the 6 functions available for testing the difference between agreement coefficients for statistical significance.

fleiss <- ttest.fleiss(ratings1,ratings2[,1:3])
ac2 <- ttest.ac2(ratings1,ratings2)
conger <- ttest.conger(ratings1,ratings2)
alpha <- ttest.alpha(ratings1,ratings2)
bp <- ttest.bp(ratings1,ratings2)
pa <-,ratings2)


  1. Gwet, K.L. (2016), Testing the Difference of Correlated Agreement Coefficients for Statistical Significance, Educational and Psychological Measurement, Vol. 76(4) 609–637.

  2. Gwet, K.L. (2021, ISBN:978-1792354632). "Handbook of Inter-Rater Reliability, Volume 1: Analysis of Categorical Ratings," 5th Edition. AgreeStat Analytics

kgwet/pairedCAC documentation built on Dec. 21, 2021, 6:37 a.m.