View source: R/find_potential_dups.R
find_potential_dups | R Documentation |
Identify potential duplicates based on title and year
find_potential_dups( CitDat, minSimilarity = 0.6, potDupAfterObvDup = TRUE, maxNumberOfComp = 1e+06, quiet = FALSE )
CitDat |
A dataframe/tibble returned by |
minSimilarity |
Minimum similarity (between 0 and 1). Default is 0.6. (TO DO) |
potDupAfterObvDup |
If TRUE (default), the newly created column
|
maxNumberOfComp |
Maximum number of clean_title similarity calculations to be made. It is set to 1,000,000 by default (which corresponds to ~ 1414 clean_titles). TO DO: Document while-loop. |
quiet |
If |
Currently this only works for files that were generated while Citavi
was set to "English" so that column names are "Short Title" etc.
A tibble containing one new column: pot_dup_id
.
example_path <- example_file("3dupsin5refs/3dupsin5refs.ctv6") CitDat <- read_Citavi_ctv6(example_path) %>% find_obvious_dups() %>% find_potential_dups() CitDat %>% dplyr::select(clean_title_id, obv_dup_id, pot_dup_id) # check similarity yourself - it's a single typo: CitDat %>% dplyr::select(clean_title)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.