find_obvious_dups: Identify obvious duplicates based on title and year

View source: R/find_obvious_dups.R

find_obvious_dupsR Documentation

Identify obvious duplicates based on title and year

Description

Identify obvious duplicates based on title and year

Usage

find_obvious_dups(CitDat, dupInfoAfterID = TRUE, preferDupsWithPDF = TRUE)

Arguments

CitDat

A dataframe/tibble returned by read_Citavi_xlsx. The following columns must be present: ID, Title, Year.

dupInfoAfterID

If TRUE (default), the newly created columns clean_title, clean_title_id, has_obv_dup and obv_dup_id are moved right next to the ID column. Additionally, the ID column is moved to the first position.

preferDupsWithPDF

If TRUE (default), obvious duplicates are sorted by their info in columns has_attachment and/or Locations (given they are present in the dataset). After sorting, duplicates with the most occurences of ".pdf" in Locations and a TRUE in has_attachment are first and will thus be chosen as dup_01.

Details

[Maturing]
Currently this only works for files that were generated while Citavi was set to "English" so that column names are "Short Title" etc.

Value

A tibble containing four additional columns: clean_title, clean_title_id, has_obv_dup and obv_dup_id.

Examples

example_path <- example_file("3dupsin5refs/3dupsin5refs.ctv6")
read_Citavi_ctv6(example_path) %>%
   find_obvious_dups() %>%
   dplyr::select(clean_title:obv_dup_id)


SchmidtPaul/CitaviR documentation built on Jan. 31, 2023, 5 a.m.