In SchmidtPaul/CitaviR: A set of tools for dealing with Citavi data

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  dpi = 450
)
set.seed(42)

In general, the work flow from start to finish is structured in three steps.

reading the data from the .ctv6 file
dealing with the data while it is outside Citavi to get the most out of it
writing/updating the data into the .ctv6 file

knitr::include_graphics("https://github.com/SchmidtPaul/CitaviR/blob/master/man/figures/WorkflowSQL.png?raw=true")

Step 1: Citavi to R

The following screenshot shows the Citavi project that is available in CitaviR as the 3dupsin5refs.ctv6.

knitr::include_graphics("https://github.com/SchmidtPaul/CitaviR/blob/master/vignettes/Citavi_Project.PNG?raw=true")

Here, we can read in the information of interest via read_Citavi_ctv6().

library(tidyverse)
library(CitaviR)

example_path <- example_file("3dupsin5refs/3dupsin5refs.ctv6") # in real life: replace with your path
CitDat <- read_Citavi_ctv6(path = example_path,
                           CitDBTableName = "Reference")

CitDat %>% select(Title, Year, Abstract, DOI)

If, for whatever reason, you wish to do the import from Citavi not via SQL, but with Excel files (exported from Citavi), then CitaviR offers an alternative approach via read_Citavi_xlsx() and write_Citavi_xlsx() described here.

# quietly store the Citavi project for later
OriginalCitDat <- CitDat

Step 2: Process data in R

At this point there are many things one may wish to do with the data. In this example we will make use of the CitaviR functions to identify and handle obvious duplicates. (Check out the article on obvious and potential duplicates for more.)

Find obvious duplicates

CitDat <- CitDat %>% 
  find_obvious_dups()

One way of identifying obvious duplicates is via CitaviR::find_obvious_dups(). In short, it first creates a clean_title by combining each reference's Title and Year into a simplified string. This simplification is based on janitor::make_clean_names() and e.g. converts to all-lowercase, and removes special characters and unnecessary spaces. If two references have the same clean_title, they are identified as obvious duplicates. In this example, two references were indeed identified as obvious duplicates:

CitDat %>%
 select(Title, clean_title:obv_dup_id)

Note how a single typo ("Hritability") prevents ct_03 from being detected as an obvious duplicate for ct_02. For cases like this, one may use CitaviR::find_potential_dups(), which is explained in the article on obvious and potential duplicates but not done here.

Handle obvious duplicates

At this point we have already gained information and could continue with steps 4 and 5. However, sometimes duplicates hold different information as it is the case here for ct_02 and the columns PubMedID and DOI:

CitDat %>% 
  filter(clean_title_id == "ct_02") %>% 
  select(clean_title_id, obv_dup_id, DOI, PubMedID)

In such a scenario it would be best to gather all information into the one non-duplicate (=dup_01) that will be kept and of interest later on. Here, CitaviR::handle_obvious_dups() comes in handy:

CitDat <- CitDat %>% 
  handle_obvious_dups(fieldsToHandle = c("DOI", "PubMedID"))

As can be seen, the columns listed in fieldsToHandle = are filled up (i.e. tidyr::fill(all_of(fieldsToHandle), .direction = "up")).

CitDat %>% 
  filter(clean_title_id == "ct_02") %>% 
  select(clean_title_id, obv_dup_id, DOI, PubMedID)

Therefore, we could now get rid of all obvious duplicates (obv_dup_id =! dup_01) without losing any information.

Step 3: R to Citavi

Finally, we want to implement the gained information into the Citavi project. To do so, we can make use of update_Citavi_ctv6(). Say we would like to overwrite the old DOI and PubMed information and additionally store the clean_title_id and obv_dup_id in Custom field 1 and Custom field 2, respectively. You should probably close your Citavi project before running this:

CitDat %>%
  update_Citavi_ctv6(
    path = example_path,
    CitDBTableName = "Reference",
    CitDatVarToCitDBTableVar = c(
        "DOI"            = "DOI",
        "PubMedID"       = "PubMedID",
        "clean_title_id" = "CustomField1",
        "obv_dup_id"     = "CustomField2"),
    quiet = FALSE
  )

If you now open the Citavi project, it should have changed as expected.

knitr::include_graphics("https://github.com/SchmidtPaul/CitaviR/blob/master/man/figures/Citavi_BeforeAfter.png?raw=true")

If, for whatever reason, you wish to do the import from Citavi not via SQL, but with Excel files (exported from Citavi), then CitaviR offers an alternative approach via read_Citavi_xlsx() and write_Citavi_xlsx() described here.

# quietly restore the Citavi project to original state
OriginalCitDat %>%
  mutate(clean_title_id = NA_character_,
         obv_dup_id = NA_character_) %>% 
  update_Citavi_ctv6(
    path = example_path,
    CitDBTableName = "Reference",
    CitDatVarToCitDBTableVar = c(
        "DOI"            = "DOI",
        "PubMedID"       = "PubMedID",
        "clean_title_id" = "CustomField1",
        "obv_dup_id"     = "CustomField2"),
    quiet = TRUE
  )