In rmetaverse/metaverse: Workflows for evidence synthesis projects

The R package ecosystem contains a huge number of resources for systematic reviews and meta-analyses. The metaverse package imports a set of these packages, selected to cover as many stages of the systematic review workflow as possible. Future versions of metaverse will aim to fill gaps in this workflow via new packages.

Data import & deduplication
Search optimisation
Article screening
Extracting data from figures
Risk-of-bias assessment
Meta-analysis
Citing metaverse

Data import & deduplication using `synthesisr`

The default way to import bibliographic information using synthesisr is to use read_refs. This function can simultaneously import multiple files in different formats, then merge them together.

file_names <- c(
  system.file("extdata", "scopus.ris", package = "synthesisr"),
  system.file("extdata", "zoorec.txt", package = "synthesisr"))
data <- synthesisr::read_refs(file_names)

These data are from a search on the effects of prescribed burning on abundance of red-cockaded woodpeckers (Picoides borealis) using two common academic resources: Scopus and Web of Science. We ran our searches on April 10, 2019 with no date restrictions. We searched Scopus (1970-2019) and five databases in Web of Science: the Web of Science Core Collection (1900-2019), BIOSIS Previews (1926-2019), Current Contents Connect (1998-2019), MEDLINE (1950-2019), and Zoological Record (1945-2019). Our search string was:

TS=(("picoides borealis" OR "red-cockaded woodpecker*" OR "red cockaded woodpecker" OR "leuconotopicus borealis" OR woodpecker) AND ("prescribed burn*" OR "prescribed fire*" OR fire* OR wildfire* OR burn*) AND (abundan* OR presen* OR occup* OR occur* OR (popul* NEAR/2 (densit* OR size))))

The function read_refs returns a data.frame by default meaning that there are a number of ways to investigate the data you've just imported:

dim(data) # number of rows and columns
colnames(data) # names of columns
str(data) # description of the content of a data.frame

Because our data are from different sources, it is likely that they contain duplicates; i.e. the same entry reported in different databases. The easiest way to remove these duplicates is to use the deduplicate function:

cleaned_data <- synthesisr::deduplicate(data, match_by = "doi", method = "exact")

You can add options to customize how this works if you wish, for example to use fuzzy rather than exact matching, or to remove upper case characters and punctuation:

cleaned_data <- synthesisr::deduplicate(data,
  match_by = "title",
  method = "string_osa",
  rm_punctuation = TRUE,
  to_lower = TRUE)

If you'd prefer to remove duplicates manually, you can do that using revtools:

cleaned_data <- revtools::screen_duplicates(data)

Search optimisation using `litsearchr`

A common question during systematic reviews is whether the search used to locate references was adequate. In particular, it can be useful to know whether other possible keywords should have been included. One way to test this is using litsearchr:

# automatically identify key terms
rake_keywords <- litsearchr::extract_terms(cleaned_data$abstract,
  method = "fakerake",
  min_freq = 5)

# or use author-defined keywords
keywords <- unique(do.call(c, strsplit(cleaned_data$keywords, " and ")))

tagged_keywords <- litsearchr::extract_terms(cleaned_data$title,
  keywords = keywords,
  method = "tagged",
  min_freq = 5,
  min_n = 1,
  max_n = 2)

We can then use this information to build a keyword co-occurrence network:

naive_dfm <- litsearchr::create_dfm(
  elements = cleaned_data$abstract,
  features = rake_keywords)

naive_graph <- litsearchr::create_network(
  search_dfm = as.matrix(naive_dfm),
  min_studies = 1,
  min_occ = 1)

And identify change points in keyword importance:

# identify a cutoff point
spline_cutoff <- litsearchr::find_cutoff(naive_graph,
  method = "cumulative",
  percent = 0.3,
  knot_num = 3)

reduced_graph <- litsearchr::reduce_graph(naive_graph,
  cutoff_strength = spline_cutoff)

search_terms <- litsearchr::get_keywords(reduced_graph)

Finally, we can group terms together and write a Boolean search:

search_groups <- split(search_terms,
  factor(
    2 - as.numeric(grepl("forest|log", search_terms)),
    levels = seq_len(2),
    labels = c("forest", "not forest")))

woodpecker_search <- litsearchr::write_search(search_groups,
  languages = "English",
  stemming = TRUE,
  closure = "left",
  exactphrase = TRUE,
  writesearch = FALSE,
  verbose = TRUE)

woodpecker_search

Article screening using `revtools`

Once you have a set of articles returned by a search, the next step is to determine which of these are most relevant to your questions. This process is called 'screening', and can be accomplished using revtools.

If you wish to manually check every entry, you can check either titles or abstracts with the associated screen_ function:

screen_titles(cleaned_data) # or
screen_abstracts(cleaned_data)

Alternatively, if you wish to see a visualisation of the patterns in your text data, you can do so using topic models to help you screen your data:

screen_topics(cleaned_data)

Data extraction using `metaDigitise`

Screening titles and abstracts is useful, but it still leaves the process of extracting data from full-text articles before meta-analysis can take place. While much of this process is quite labor-intensive, the process of extracting data from figures can be made easier by using metaDigitise.

You can see the full vignette here; but briefly, the process is to copy your images into a single directory and pass that information to the metaDigitise function:

data <- metaDigitise::metaDigitise(dir = "~/extracted_figures/")

This function will then walk you through how to process the images and display the results.

Risk-of-bias assessment using `robvis`

A common task in some evidence synthesis projects is evaluation of the internal validity of projects included in the review, and what the potential is for bias as a result of these assessments. The robvis package provides functions to convert a risk-of-bias assessment summary table into a summary plot or a traffic-light plot, formatted based on the specific risk-of-bias assessment tool used. A comprehensive vignette is available on CRAN; but the basic usage is to first import or create some data in the correct format, and then use robvis to either plot a summary:

robvis::rob_summary(robvis::data_rob2, tool = "ROB2")

or a more detailed traffic light plot:

robvis::rob_traffic_light(robvis::data_rob2, tool = "ROB2", psize = 10)

Meta-analysis using `metafor`

Meta-analysis is a large topic, and it would be impossible for a single vignette to cover all of the available options. However, metafor is one of the oldest and most widely-used R packages available for this topic. It also has a well-developed website that provides examples and tips.

The basic approach demonstrated below is to calculate a standardised effect size, then create a model using these new data:

model_inputs <- metafor::escalc(measure = "RR",
    ai = tpos,
    bi = tneg,
    ci = cpos,
    di = cneg,
    data = metafor::dat.bcg)

model <- metafor::rma(yi, vi, data = model_inputs, method = "EB")

summary(model)

Citing `metaverse`

We've worked pretty hard on getting metaverse working, and we hope that you like it and find it useful. However, the vast majority of the work took place by individuals and teams of developers on the packages that metaverse imports. Therefore, if you use metaverse in your work, please cite the specific packages that you have used. The appropriate citations are:

Westgate MJ & Grames EM (2020) synthesisr: Import, Assemble, and Deduplicate Bibliographic Datasets. R package version 0.3.0. https://CRAN.R-project.org/package=synthesisr
Grames EM, Stillman AN, Tingley MW & Elphick CS (2019) An automated approach to identifying search terms for systematic reviews using keyword co-occurrence networks. Methods in Ecology and Evolution 10(10): 1645-1654. DOI: 10.1111/2041-210X.13268.
Westgate MJ (2019) revtools: An R package to support article screening for evidence synthesis. Research Synthesis Methods 10(4): 606-614. doi: 10.1002/jrsm.1374
Pick JL, Nakagawa S & Noble DWA (2018) Reproducible, flexible and high-throughput data extraction from primary literature: The metaDigitise R package. Methods in Ecology and Evolution 10(3): 426-431. DOI: 10.1111/2041-210X.13118
Viechtbauer W (2010) Conducting meta-analyses in R with the metafor package. Journal of Statistical Software 36(3): 1-48. DOI: 10.18637/jss.v036.i03
McGuinness LA, Higgins JPT (2020) Risk-of-bias VISualization (robvis): An R package and Shiny web app for visualizing risk-of-bias assessments. Research Synthesis Methods. DOI: 10.1002/jrsm.1411