| agree_tab | R Documentation |
Two types of comparing categories are provided:
agree_tab(
data,
cols,
coders,
ids = NULL,
category = NULL,
method = "reliability",
labels = TRUE,
clean = TRUE,
...
)
data |
A tibble containing item measures, coders and case IDs. |
cols |
A tidy selection of item variables (e.g. starts_with...) with ratings. |
coders |
The column holding coders or methods to compare. |
ids |
The column with case IDs. |
category |
For classification performance indicators, if no category is provided, macro statistics are returned (along with the number of categories in the output). Provide a category to get the statistics for this category only. If values are boolean (TRUE / FALSE) and no category is provided, the category is always assumed to be "TRUE". |
method |
The output metrics, one of |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from report_counts. |
Reliability: Compare codings of two or more raters in content analysis. Common reliability measures are percent agreement (also known as Holsti), Fleiss' or Cohen's Kappa, Krippendorff's Alpha and Gwets AC.
Classification: Compare true and predicted categories from classification methods. Common performance metrics include accuracy, precision, recall and F1.
A volker tibble with one row for each item. The item name is returned in the first column. For the reliability method, the following columns are returned:
n: Number of cases (each case id is only counted once).
Coders: Number of coders.
Categories: Number of categories.
Holsti: Percent agreement (same as accuracy).
Krippendorff' Alpha: Chance-corrected reliability score.
Kappa: Depending on the number of coders either Cohen's Kappa (two coders) or Fleiss' Kappa (more coders).
Gwet's AC1: Gwet's agreement coefficient.
For the classification method, the following columns are returned:
n: Number of cases (each case id is only counted once)
Categories: Number of categories
Accuracy: Share of correct classifications.
Precision: Share of true cases in all detected true cases.
Recall: Share of true cases detected from all true cases.
F1: Harmonic mean of precision and recall.
library(dplyr)
library(volker)
data <- volker::chatgpt
# Prepare example data.
# First, recode "x" to TRUE/FALSE for the first coder's sample.
data_coder1 <- data |>
mutate(across(starts_with("cg_act_"), ~ ifelse(is.na(.), FALSE, TRUE))) %>%
mutate(coder = "coder one")
# Second, recode using a dictionary approach for the second coder's sample.
data_coder2 <- data |>
mutate(across(starts_with("cg_act_"), ~ ifelse(is.na(.), FALSE, TRUE))) %>%
mutate(cg_act_write = grepl("write|text|translate", tolower(cg_activities))) %>%
mutate(coder="coder two")
data_coded <- bind_rows(
data_coder1,
data_coder2
)
# Reliability coefficients are strictly only appropriate for manual codings
agree_tab(data_coded, cg_act_write, coder, case, method = "reli")
# Better use classification performance indicators to compare the
# dictionary approach with human coding
agree_tab(data_coded, cg_act_write, coder, case, method = "class")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.