rt_summary: Summarize transparency indicators across a corpus of articles
In rtransparency: Identifies Indicators of Transparency

rt_summary

R Documentation

Summarize transparency indicators across a corpus of articles

Description

Takes a data frame with one row per article (such as the output of [rt_all_pmc()] joined with [rt_data_code_pmc()], stacked over many articles) and returns the prevalence of each transparency indicator. For each indicator it reports the number of articles assessed, the number in which the indicator was detected, the apparent prevalence and its Wilson confidence interval and, optionally, a prevalence corrected for the detector's sensitivity and specificity (the Rogan-Gladen estimator).

Usage

rt_summary(
  data,
  indicators = NULL,
  by = NULL,
  adjust = TRUE,
  accuracy = NULL,
  conf_level = 0.95
)

Arguments

`data`	A data frame with one row per article. Indicator columns must be logical or numeric 0/1 and named as in [rt_all_pmc()]: 'is_coi_pred', 'is_fund_pred', 'is_register_pred', 'is_open_data', 'is_open_code', 'is_novelty_pred', 'is_replication_pred' and 'is_ai_pred'. 'NA' marks an article that was not assessed for that indicator (for example 'is_ai_pred' before 2023) and is excluded from its denominator. Other values are rejected rather than silently coerced.
`indicators`	Optional character vector of indicator columns to summarize. Defaults to every recognized indicator present in 'data'.
`by`	Optional name of a grouping column (for example a publication year, journal or article type); the summary is then computed within each group.
`adjust`	If 'TRUE' (default), add a prevalence corrected for detector sensitivity and specificity using 'accuracy'. Indicators absent from 'accuracy' receive 'NA' corrected values.
`accuracy`	A data frame of detector accuracy with columns 'variable', 'sensitivity' and 'specificity'. Defaults to [rt_accuracy].
`conf_level`	Confidence level for the intervals (default '0.95').

Value

A tibble with one row per indicator (per group, if 'by' is given): the grouping column (when 'by' is used), 'indicator', 'label', 'n_articles', 'n_detected', 'percent', 'conf_low', 'conf_high' and, when 'adjust = TRUE', 'adj_percent', 'adj_low' and 'adj_high'. Percentages and interval bounds are on the 0-100 scale.

Examples

data(rt_demo)
rt_summary(rt_demo)

# Apparent prevalence only, no accuracy correction
rt_summary(rt_demo, adjust = FALSE)

# By article type
rt_summary(rt_demo, by = "type")

rtransparency documentation built on July 1, 2026, 9:07 a.m.