Description Usage Arguments Details Value Examples
View source: R/flag_duplicate_title.R
Flags articles in a dataframe that are duplicated and has the same identifier
1 | flag_duplicate_title(df, title = "title", max_distance = 5L)
|
df |
a data frame with potential duplicates |
title |
the <chr> column in df that has the title of studies |
max_distance |
the maximum difference of titles to be flagged as duplicate |
The function uses Optimal String Alignment distance to find the difference between strings, using the stringdist::stringdist() function (for details, see stringdist-metrics
). Note that this function can also be used to find duplicates based on the abstract or any other text field. It can be a computationally heavy task for more and longer strings.
The original data frame augmented with "duplicate_by_title" column, that can be 0 or 1
1 2 3 4 5 6 | library(dplyr)
# Show all articles with duplicated title
merge_sources(workaholism_pubmed, workaholism_psychinfo) %>%
make_id(c("psyid", "pmid", "doi", "eid", "sid")) %>%
flag_duplicate_title(title = "title") %>%
filter(duplicate_by_title == 1)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.