find_obvious_dups: Identify obvious duplicates based on title and year
In SchmidtPaul/CitaviR: A set of tools for dealing with Citavi data

find_obvious_dups

R Documentation

Identify obvious duplicates based on title and year

Description

Identify obvious duplicates based on title and year

Usage

find_obvious_dups(CitDat, dupInfoAfterID = TRUE, preferDupsWithPDF = TRUE)

Arguments

`CitDat`	A dataframe/tibble returned by `read_Citavi_xlsx`. The following columns must be present: `ID`, `Title`, `Year`.
`dupInfoAfterID`	If TRUE (default), the newly created columns `clean_title`, `clean_title_id`, `has_obv_dup` and `obv_dup_id` are moved right next to the `ID` column. Additionally, the `ID` column is moved to the first position.
`preferDupsWithPDF`	If TRUE (default), obvious duplicates are sorted by their info in columns `has_attachment` and/or `Locations` (given they are present in the dataset). After sorting, duplicates with the most occurences of `".pdf"` in `Locations` and a `TRUE` in `has_attachment` are first and will thus be chosen as `dup_01`.

Details

Currently this only works for files that were generated while Citavi was set to "English" so that column names are "Short Title" etc.

Value

A tibble containing four additional columns: clean_title, clean_title_id, has_obv_dup and obv_dup_id.

Examples

example_path <- example_file("3dupsin5refs/3dupsin5refs.ctv6")
read_Citavi_ctv6(example_path) %>%
   find_obvious_dups() %>%
   dplyr::select(clean_title:obv_dup_id)

SchmidtPaul/CitaviR documentation built on Jan. 31, 2023, 5 a.m.