extract_unique_references: Remove duplicates from a bibliographic data set
In rmetaverse/synthesisr: Import, Assemble, and Deduplicate Bibliographic Datasets

View source: R/deduplication_functions.R

extract_unique_references

R Documentation

Remove duplicates from a bibliographic data set

Description

Given a list of duplicate entries and a data set, this function extracts only unique references.

Usage

extract_unique_references(data, matches, type = "merge")

Arguments

`data`	A `data.frame` containing bibliographic information.
`matches`	A vector showing which entries in `data` are duplicates.
`type`	How should entries be selected to retain? Default is `"merge"`, which selects the entries with the largest number of characters in each column. Alternatively, `"select"` returns the row with the highest total number of characters.

Value

Returns a data.frame of unique references.

Examples

my_df <-  data.frame(
  title = c(
    "EviAtlas: a tool for visualising evidence synthesis databases",
    "revtools: An R package to support article screening for evidence synthesis",
    "An automated approach to identifying search terms for systematic reviews",
    "Reproducible, flexible and high-throughput data extraction from primary literature",
    "eviatlas:tool for visualizing evidence synthesis databases.",
    "REVTOOLS a package to support article-screening for evidence synthsis"
  ),
  year = c("2019", "2019", "2019", "2019", NA, NA),
  authors = c("Haddaway et al", "Westgate",
              "Grames et al", "Pick et al", NA, NA),
  stringsAsFactors = FALSE
)

# run deduplication
dups <- find_duplicates(
  my_df$title,
  method = "string_osa",
  rm_punctuation = TRUE,
  to_lower = TRUE
)

extract_unique_references(my_df, matches = dups)

# or, in one line:
deduplicate(my_df, "title",
  method = "string_osa",
  rm_punctuation = TRUE,
  to_lower = TRUE)

rmetaverse/synthesisr documentation built on Feb. 23, 2025, 5:29 p.m.