agree_tab: Agreement for multiple items
In volker: High-Level Functions for Tabulating, Charting and Reporting Survey Data

agree_tab

R Documentation

Agreement for multiple items

Description

Two types of comparing categories are provided:

Usage

agree_tab(
  data,
  cols,
  coders,
  ids = NULL,
  category = NULL,
  method = "reliability",
  labels = TRUE,
  clean = TRUE,
  ...
)

Arguments

`data`	A tibble containing item measures, coders and case IDs.
`cols`	A tidy selection of item variables (e.g. starts_with...) with ratings.
`coders`	The column holding coders or methods to compare.
`ids`	The column with case IDs.
`category`	For classification performance indicators, if no category is provided, macro statistics are returned (along with the number of categories in the output). Provide a category to get the statistics for this category only. If values are boolean (TRUE / FALSE) and no category is provided, the category is always assumed to be "TRUE".
`method`	The output metrics, one of `reliability` or `classification`. You can abbreviate it, e.g. `reli` or `class`.
`labels`	If TRUE (default) extracts labels from the attributes, see codebook.
`clean`	Prepare data by data_clean.
`...`	Placeholder to allow calling the method with unused parameters from report_counts.

Details

Reliability: Compare codings of two or more raters in content analysis. Common reliability measures are percent agreement (also known as Holsti), Fleiss' or Cohen's Kappa, Krippendorff's Alpha and Gwets AC.
Classification: Compare true and predicted categories from classification methods. Common performance metrics include accuracy, precision, recall and F1.

Value

A volker tibble with one row for each item. The item name is returned in the first column. For the reliability method, the following columns are returned:

n: Number of cases (each case id is only counted once).
Coders: Number of coders.
Categories: Number of categories.
Holsti: Percent agreement (same as accuracy).
Krippendorff' Alpha: Chance-corrected reliability score.
Kappa: Depending on the number of coders either Cohen's Kappa (two coders) or Fleiss' Kappa (more coders).
Gwet's AC1: Gwet's agreement coefficient.

For the classification method, the following columns are returned:

n: Number of cases (each case id is only counted once)
Categories: Number of categories
Accuracy: Share of correct classifications.
Precision: Share of true cases in all detected true cases.
Recall: Share of true cases detected from all true cases.
F1: Harmonic mean of precision and recall.

Examples

library(dplyr)
library(volker)

data <- volker::chatgpt

# Prepare example data.
# First, recode "x" to TRUE/FALSE for the first coder's sample.
data_coder1 <- data |>
  mutate(across(starts_with("cg_act_"), ~ ifelse(is.na(.), FALSE, TRUE))) %>%
  mutate(coder = "coder one")

# Second, recode using a dictionary approach for the second coder's sample.
data_coder2 <- data |>
  mutate(across(starts_with("cg_act_"), ~ ifelse(is.na(.), FALSE, TRUE))) %>%
  mutate(cg_act_write = grepl("write|text|translate", tolower(cg_activities))) %>%
  mutate(coder="coder two")

data_coded <- bind_rows(
  data_coder1,
  data_coder2
)

# Reliability coefficients are strictly only appropriate for manual codings
agree_tab(data_coded, cg_act_write,  coder, case, method = "reli")

# Better use classification performance indicators to compare the
# dictionary approach with human coding
agree_tab(data_coded, cg_act_write,  coder, case, method = "class")

volker documentation built on Nov. 5, 2025, 5:21 p.m.