knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
The goal of pairedCAC is to allow you to test the difference between 2 chance-corrected agreement coefficients for statistical significance.
library(pairedCAC)
You can install the most current version of pairedCAC from the GitHub repositure as follows:
devtools::install_github("kgwet/pairedCAC")
The pairedCAC is an R package that provides a series of functions for testing the difference between 2 correlated chance-corrected agreement coefficients for statistical significance. This package closely follows the general framework discussed by Gwet (2016) and expanded more recently by Gwet (2021).
Typically, 2 agreement coefficients are correlated when they are based on 2 overlapping samples of subjects or 2 overlapping rosters of raters. For uncorrelated coefficients, the testing is straightforward and discussed in Gwet(2021, section 9.3).
Suppose that Fleiss' generalized coefficient is calculated on 2 occasions using the 2 datasets of ratings ratings1 and ratings2 included in this package. How do you determine whether the difference between these 2 Fleiss' kappa coefficients is statistically significant? Proceed as follows:
fleiss <- ttest.fleiss(ratings1,ratings2) fleiss$test
The function ttest.fleiss()
returns a 2-element list, the first of which is a data frame named test
that you can display with the expression fleiss$test
. This data frame contains the test statistics shown above.
Fleiss' generalized kappa coefficients associated with the datasets 1 and 2 are respectively given by fleiss.coeff1 =r round(fleiss$test$fleiss.coeff1,3)
and fleiss.coeff2 = r round(fleiss$test$fleiss.coeff2,3)
. The difference between the second and first coefficient that is to be tested for statistical significance is given by coeff.diff = r round(fleiss$test$coeff.diff,3)
and its standard error is std.err = r round(fleiss$test$std.err,3)
.
Now, a key element for testing the statistical significance of the difference is the Test Statistic t.stat = r round(fleiss$test$t.stat,3)
that you must compare to the $97.5$-th of the Standard Normal distribution, which is r round(qnorm(0.975),4)
. Since the test statistic exceeds this threshold, you may conclude that the difference is statistically significant.
The ttest.fleiss()
function also outputs the p-value associated with the difference between the 2 Fleiss' kappa coefficients. In our example, it is given by p-value = r round(fleiss$test$p.value,4)
, which is smaller than the standard threshold of 0.05. This is an indication that the difference is statistically significant.
Here are the 2 data frames ratings1
and ratings2
and the 6 functions available for testing the difference between agreement coefficients for statistical significance.
data.frame(ratings1,ratings2) fleiss <- ttest.fleiss(ratings1,ratings2[,1:3]) ac2 <- ttest.ac2(ratings1,ratings2) conger <- ttest.conger(ratings1,ratings2) alpha <- ttest.alpha(ratings1,ratings2) bp <- ttest.bp(ratings1,ratings2) pa <- ttest.pa(ratings1,ratings2) fleiss ac2 conger alpha bp pa
Gwet, K.L. (2016), Testing the Difference of Correlated Agreement Coefficients for Statistical Significance, Educational and Psychological Measurement, Vol. 76(4) 609–637.
Gwet, K.L. (2021, ISBN:978-1792354632). "Handbook of Inter-Rater Reliability, Volume 1: Analysis of Categorical Ratings," 5th Edition. AgreeStat Analytics
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.