flag_duplicate_title: Flag duplicates by title
In nthun/metamanager: Manage Meta-analysis Workflow in R

Description Usage Arguments Details Value Examples

Flags articles in a dataframe that are duplicated and has the same identifier

1	flag_duplicate_title(df, title = "title", max_distance = 5L)

`df`	a data frame with potential duplicates
`title`	the <chr> column in df that has the title of studies
`max_distance`	the maximum difference of titles to be flagged as duplicate

The function uses Optimal String Alignment distance to find the difference between strings, using the stringdist::stringdist() function (for details, see stringdist-metrics). Note that this function can also be used to find duplicates based on the abstract or any other text field. It can be a computationally heavy task for more and longer strings.

The original data frame augmented with "duplicate_by_title" column, that can be 0 or 1

library(dplyr)
# Show all articles with duplicated title
merge_sources(workaholism_pubmed, workaholism_psychinfo) %>%
 make_id(c("psyid", "pmid", "doi", "eid", "sid")) %>%
 flag_duplicate_title(title = "title") %>%
 filter(duplicate_by_title == 1)