knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Systematic review searches include multiple databases that export results in a
variety of formats with overlap in coverage between databases. To streamline the
process of importing, assembling, and deduplicating results, synthesisr
recognizes bibliographic files exported from databases commonly used for
systematic reviews and merges results into a standardized format.
synthesisr
can read any BibTex or RIS formatted bibliographic data files. It
detects whether files are more bib-like or ris-like and imports them
accordingly. Note that files from some databases may contain non-standard fields
or non-standard characters that cause import failure in rare cases; if this
happens, we recommend converting the file in open source bibliographic
management software such as Zotero.
In the code below, we will demonstrate how to read and assemble bibliographic
data files with example datasets included in the synthesisr
package. Note that
if you are using the code with your own data, you will not need to use
system.file()
and instead will want to pass a character vector of the path(s)
to the file(s) you want to import. For example, if you have saved all your
search results in a directory called "search_results", you may want to use
list.files("./search_results/")
instead.
#| eval: false # system.file will look for the path to where synthesisr is installed # by using the example bibliographic data files, you can reproduce the vignette bibfiles <- list.files( system.file("extdata/", package = "synthesisr"), full.names = TRUE ) # we can print the list of bibfiles to confirm what we will import # in this example, we have bibliographic data exported from Scopus and Zoological Record print(bibfiles) # now we can use read_refs to read in our bibliographic data files # we save them to a data.frame object (because return_df=TRUE) called imported_files library(synthesisr) imported_files <- read_refs( filename = bibfiles, return_df = TRUE)
Many journals are indexed in multiple databases, so searching across databases
will retrieve duplicates. After import, synthesisr
can detect duplicates and
retain only unique bibliographic records using a variety of methods such as
string distance or fuzzy matching records. A good place to start is removing
articles that have identical titles, especially since this reduces computational
time for more sophisticated deduplication methods.
#| eval: false ## first, we will remove articles that have identical titles ## this is a fairly conservative approach, so we will remove them without review # df <- deduplicate( # imported_files, # match_by = "title", # method = "exact" # )
In some cases, it may be useful to know which articles were identified as duplicates so they can be manually reviewed or so that information from two records can be merged. Using our partially-deduplicated dataset, we check a few titles and use string distance methods to find additional duplicate articles in the code below and then remove them by extracting unique references. Although here we only use one secondary deduplication method (string distance), we could look for additional duplicates based on fuzzy matching abstracts, for example.
#| eval: false # there are still some duplicate articles that were not removed # for example, the titles for articles 91 and 114 appear identical ## df$title[c(91,114)] # the dash-like symbol in title 91, however, is a special character not punctuation # so it was not classified as identical # similarly, there is a missing space in the title for article 96 ## df$title[c(21,96)] # and an extra space in title 47 ## df$title[c(47, 101)] # # in this example, we will use string distance to identify likely duplicates # duplicates_string <- find_duplicates( # df$title, # method = "string_osa", # to_lower = TRUE, # rm_punctuation = TRUE, # threshold = 7 # ) # we can extract the line numbers from the dataset that are likely duplicated # this lets us manually review those titles to confirm they are duplicates # manual_checks <- review_duplicates(df$title, duplicates_string)
#| eval: false # manual_checks[,1] <- substring(manual_checks[,1], 1, 60) # # print(manual_checks[1:10, ])
#| eval: false # # the titles under match #99 are not duplicates, so we need to keep them both # # we can use the override_duplicates function to manually mark them as unique # new_duplicates <- synthesisr::override_duplicates(duplicates_string, 99) # # # now we can extract unique references from our dataset # # we need to pass it the dataset (df) and the matching articles (new_duplicates) # results <- extract_unique_references(df, new_duplicates)
To facilitate exporting results to other platforms after assembly and
deduplication, synthesisr
can write bibliographic data to .ris or .bib files.
Optionally, write_refs()
can write directly to a text file stored locally.
#| paged.print: TRUE #| eval: false # # synthesisr can write the full dataset to a bibliographic file # # but in this example, we will just write the first citation # # we also want it to be a nice clean bibliographic file, so we remove NA data # # this makes it easier to view the output when working with a single article # citation <- df[1, !is.na(df[1,])] # # format_citation(citation) # # write_refs(citation, # format = "bib", # file = FALSE # )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.