flag_duplicate_title: Flag duplicates by title

Description Usage Arguments Details Value Examples

View source: R/flag_duplicate_title.R

Description

Flags articles in a dataframe that are duplicated and has the same identifier

Usage

1
flag_duplicate_title(df, title = "title", max_distance = 5L)

Arguments

df

a data frame with potential duplicates

title

the <chr> column in df that has the title of studies

max_distance

the maximum difference of titles to be flagged as duplicate

Details

The function uses Optimal String Alignment distance to find the difference between strings, using the stringdist::stringdist() function (for details, see stringdist-metrics). Note that this function can also be used to find duplicates based on the abstract or any other text field. It can be a computationally heavy task for more and longer strings.

Value

The original data frame augmented with "duplicate_by_title" column, that can be 0 or 1

Examples

1
2
3
4
5
6
library(dplyr)
# Show all articles with duplicated title
merge_sources(workaholism_pubmed, workaholism_psychinfo) %>%
 make_id(c("psyid", "pmid", "doi", "eid", "sid")) %>%
 flag_duplicate_title(title = "title") %>%
 filter(duplicate_by_title == 1)

nthun/metamanager documentation built on Aug. 9, 2019, 1:37 p.m.