orm_dedup: Automatic deduplication of bibliographic records
In orisma: Occupational Risk Integrated Systematic Mapping and Analysis

orm_dedup

R Documentation

Automatic deduplication of bibliographic records

Description

orm_dedup() removes duplicate records using a three-step progressive pipeline:

Exact DOI match — most reliable signal; decisive for records with DOIs.
Normalised title match — removes punctuation, accents, case, and extra spaces before comparing; catches the same article listed with minor typographic differences across databases.
Fuzzy match — compares title + year + first author using Optimal String Alignment distance; catches near-identical records that escape exact matching (e.g. different journal abbreviations, truncated author lists).

Only records that remain ambiguous after all three steps are flagged for optional manual review. These are saved to dedup_log.csv.

Usage

orm_dedup(
  refs,
  fuzzy_threshold = 0.9,
  lang = getOption("orisma.lang", "en"),
  verbose = getOption("orisma.verbose", TRUE),
  save_log = TRUE
)

Arguments

`refs`	An `orisma_refs` object returned by `orm_load()`.
`fuzzy_threshold`	Numeric (0–1). Similarity threshold for fuzzy matching. Default `0.90` (90% similarity = duplicate). Increase for stricter matching, decrease for more aggressive deduplication.
`lang`	Character. `"en"` or `"es"`. Overrides `orisma.lang` option.
`verbose`	Logical. Print progress? Default `TRUE`.
`save_log`	Logical. Save `dedup_log.csv` to working directory? Default `TRUE`.

Value

An orisma_refs tibble with duplicates removed. Attributes record deduplication statistics for inclusion in the PRISMA log.

Examples

## Not run: 
refs    <- orm_load("my_references/")
deduped <- orm_dedup(refs)

# More aggressive fuzzy matching
deduped <- orm_dedup(refs, fuzzy_threshold = 0.85)

# Spanish messages, no log file
deduped <- orm_dedup(refs, lang = "es", save_log = FALSE)

## End(Not run)

orisma documentation built on May 19, 2026, 1:07 a.m.