| orm_dedup | R Documentation |
orm_dedup() removes duplicate records using a three-step progressive
pipeline:
Exact DOI match — most reliable signal; decisive for records with DOIs.
Normalised title match — removes punctuation, accents, case, and extra spaces before comparing; catches the same article listed with minor typographic differences across databases.
Fuzzy match — compares title + year + first author using Optimal String Alignment distance; catches near-identical records that escape exact matching (e.g. different journal abbreviations, truncated author lists).
Only records that remain ambiguous after all three steps are flagged for
optional manual review. These are saved to dedup_log.csv.
orm_dedup(
refs,
fuzzy_threshold = 0.9,
lang = getOption("orisma.lang", "en"),
verbose = getOption("orisma.verbose", TRUE),
save_log = TRUE
)
refs |
An |
fuzzy_threshold |
Numeric (0–1). Similarity threshold for fuzzy
matching. Default |
lang |
Character. |
verbose |
Logical. Print progress? Default |
save_log |
Logical. Save |
An orisma_refs tibble with duplicates removed. Attributes record
deduplication statistics for inclusion in the PRISMA log.
## Not run:
refs <- orm_load("my_references/")
deduped <- orm_dedup(refs)
# More aggressive fuzzy matching
deduped <- orm_dedup(refs, fuzzy_threshold = 0.85)
# Spanish messages, no log file
deduped <- orm_dedup(refs, lang = "es", save_log = FALSE)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.