In softloud/sysrevdata: Methods for designing multifunctional, machine readable systematic review and map databases

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(tippy)
options(encoding="latin1")

# packages used in this toolchain walkthrough
library(sysrevdata)
library(tidyverse)
library(kableExtra)
library(janitor)


# this avoids tidyverse conflicts with the base function filter
conflicted::conflict_prefer("filter", "dplyr")

creating a narrative synthesis table

The objective of a r tippy("narrative synthesis","Narrative synthesis refers to an approach to the systematic review and synthesis of findings from multiple studies that relies primarily on the use of words and text to summarise and explain the findings of the synthesis. Whilst narrative synthesis can involve the manipulation of statistical data, the defining characteristic is that it adopts a textual approach to the process of synthesis to ‘tell the story’ of the findings from the included studies.' From: Popay, J., Roberts, H., Sowden, A., Petticrew, M., Arai, L., Rodgers, M., Britten, N., Roen, K. and Duffy, S. 2006. Guidance on the conduct of narrative synthesis in systematic reviews. A product from the ESRC methods programme Version, 1, p.b92.") is to summarise each study included in a systematic review or systematic map, often using a r tippy("human-readable", "i.e. easy-to-read and comprehend, readily digestible format") table:

Rows | One study per row.

Columns | Extracted meta-data and descriptive information about each study.

For some syntheses, particularly r tippy("systematic maps", "'Systematic mapping does not aim to answer a specific question as does a systematic review, but instead collates, describes and catalogues available evidence (e.g. primary, secondary, quantitative or qualitative) relating to a topic of interest.' From: James, K.L., Randall, N.P. & Haddaway, N.R. 2016. A methodology for systematic mapping in environmental sciences. Environmental Evidence, 5: 7. https://doi.org/10.1186/s13750-016-0059-6"), the number of variables considered can be substantial, and this poses a challenge for fitting the table into a succinct readable format.

Here we consider ways we can condense and format data for presentation for humans to read, as opposed to machine-readable data for analysis and complex visualisation.

We may begin with sparsely-filled wide data or long-format machine-structured data, neither of which are good for inclusion in the text of a research manuscript or interpretation by human readers. So, we will consider how to condense these structures in turn, before demonstrating how to format the output.

condensing from wide data

We'll use bufferstrips (from this published systematic map), which can be loaded from sysrevdata, to investigate this. These data are in a r tippy("wide and sparse", "Each value for each variable is presented in a separate column: so the variable 'season' would have 4 columsn: spring, summer, autumn, winter, with some value (e.g. 'yes') indicating presence of that code for a particular study") format, useful for creating systematic maps. We'll restrict ourselves to the first 5 rows of studies with head(5) as we work through these examples, so this document doesn't run too long.

buffer_example <-
  bufferstrips %>% 
  head(5)

usethis::use_data(buffer_example, overwrite = TRUE)

To see how these data are sparse and wide, we will extract the abbreviated title, year, and the columns that contain observations relating to spatial scale.

buffer_example %>% 
  select(short_title, contains("spatial"))

When we format these data (which we'll provide toolchains for at the end), the wideness becomes an issue. And this is with only the spatial variables (columns) selected. The table is unwieldy, large, and filled with white space that together make it very difficult for a reader to comprehend.

buffer_example %>%
  mutate(across(where(is.character), replace_na, "")) %>%
  select(short_title, contains("spatial")) %>%
  kable()

To solve the wideness issue, we'll r tippy("condense", "i.e. compress multiple values of the same variable into a single column") the variables, specifying the character that should be used to separate the values where multiple values exist in one cell. Here's a way of doing this for the spatial variable.

buffer_example %>%
  # choose the desired columns
  select(short_title, contains("spatialscale")) %>%
  # condense variable
  unite(# name condensed column
    col = "spatialscale",
    # columns to condense
    contains("spatialscale"),
    # set separator
    sep = "; ",
    # remove NAs
    na.rm = TRUE)

You can see that spatial scale is now a single column, with some cells containing multiple values, separated with a semi-colon and a space for readability.

But, of course, we wish to do this for all of the variables. The table contains columns that don't need condensing, such as title, year, and so forth.

buffer_example %>% 
  names() %>% cat()

Our first task is to identify the columns that have common prefixes, that is, columns that begin with the same text string (e.g. 'spatialscale_plot', 'spacialscale_field', and 'spacialscale_farm').

(
  buffer_variables <-
    # extract the prefix of each column
    # i.e., the string before the first _
    buffer_example %>%
    # we can exclude the column names that don't contain _
    select(contains("_"), -study_country, -study_location) %>%
    # get column names
    names() %>%
    # extract the word before the first _
    str_extract("[a-z]*") %>% {
      # convert to a tibble so we can use count
      tibble(variables = .)
    } %>%
    # count how many instances
    count(variables) %>%
    # filter out the unique variables we won't condense
    filter(n > 1) %>%
    # convert to a vector
    pull(variables)
)

usethis::use_data(buffer_variables)

Now we adapt the code we used to condense the variables to a r tippy("function", "Functions are mini R-based 'programmes' that allow us to replicate an action easily if we're likely to perform it more than once.").

condense_variable <-
function(variable_to_condense, df) {
  buffer_example %>%
    # choose the desired columns
    select(contains(variable_to_condense)) %>%
    # condense variable
    unite(# name condensed column
      col = !!variable_to_condense,
      # columns to condense
      contains(variable_to_condense),
      # set separator
      sep = "; ",
      # remove NAs
      na.rm = TRUE)
}

condense_variable("spatialscale")

Next we apply our function to every variable we wish to condense and combine the results into a new table of condensed variables.

condensed_buffer_example <-
buffer_variables %>%
  # apply the function to each variable we wish to condense, producing a list of dataframes
  map(condense_variable) %>% 
  # bind the columns together, with the short title as the first column
  bind_cols(
    buffer_example %>% select(-contains(buffer_variables)), .
  )

# just showing the condensed variables
condensed_buffer_example %>% 
  select(short_title, contains(buffer_variables))

usethis::use_data(condensed_buffer_example, overwrite = TRUE)

The output is a condensed datatable suitable for inclusion in a manuscript, with a minimum number of columns, no unnecessary white space, designed specifically to be easily digestible by the reader. In the interests of r tippy("Open Data", "'Open data' is the concept that research findings should be freely available to everyone to use as they wish, without restriction or control: Auer, S. R.; Bizer, C.; Kobilarov, G.; Lehmann, J.; Cyganiak, R.; Ives, Z. 2007. 'DBpedia: A Nucleus for a Web of Open Data'. The Semantic Web. Lecture Notes in Computer Science. 4825: 722. https://doi.org/10.1007/978-3-540-76298-0_52."), review authors should also make the underlying data freely accessible in a widely usable and easily understandable format, not solely in text form.

condensing from long data

Suppose we start with long-format data in a r tippy("tidy", "'Tidy datasets provide a standardized way to link the structure of a dataset (its physical layout) with its semantics (its meaning)' From: Wickham, H. 2020. tidyr: Tidy Messy Data. https://cran.r-project.org/package=tidyr") format, where each row is an independent observation for a specific study: in the case of systematic maps, this could be each outcome measured using a different method, for example. In the interests of space, we'll show the title, category, and subcategory: the columns that don't require condensing remain the same.

For this section, we'll use the long format of the bufferstrips dataset, using the dataframe created in the vignette on creating long-form datasets.

buffer_example_long %>% 
  # in the interests of space, we'll just show the relevant columns
  select(short_title, category_type, subcategory_type)

Now we'll r tippy("pivot", "Pivoting refers to changing the orientation of a dataset from wide to long or vice versa") wide so we can condense.

# this function creates a string from the vector of multiple values

mash_subcategories <- function(x) {
  paste0(x, collapse = "; ")
}

# go from long to wide
buffer_example_long %>% 
  pivot_wider(
    names_from = category_type,
    values_from = subcategory_value,
    values_fn = mash_subcategories
  ) %>% 
  # we'll just show a few examples because space
  select(short_title, vegetated, farmingproductionsystem)

Now we have the long data in the same condensed format as the wide data in the section above, and we are ready to consider formatting functions for publication.

formatting for output

We can format the database into a nice, readable format by converting it into an html output designed for human readability.

condensed_buffer_example %>% 
  # converts dataframe to an html output table
  kable() %>% 
  # useful function for customising table formatting
  kable_styling(
    bootstrap_options = "striped", 
    font_size = 9)

The formatting still isn't perfect because the cell values are long, and the URLs that we've included to Google Scholar are unwieldy. We can present this with a separate table of metadata, using the short title as the reference. We'll combine the Google Scholar link with the title.

study_descriptions <-
buffer_example %>% 
  # combine the google scholar link in a way that will be rendered as a clickable link 
  mutate(title = str_c("[", title, "](",google_scholar_link,")")) %>% 
  select(-contains(buffer_variables),
         -google_scholar_link) 

study_descriptions %>% 
  kable() %>% 
  kable_styling("striped",
                font_size = 9)

And, if we are happy to have a wider table, we can combine both tables together to form one narrative synthesis table with all variables condensed. Although wide, it is not as wide or sparse as where we started, and is much easier to read.

full_join(study_descriptions, condensed_buffer_example,
          by = "short_title") %>% 
  kable() %>% 
   kable_styling("striped",
                font_size = 9)

softloud/sysrevdata documentation built on June 7, 2022, 1:21 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

softloud/sysrevdata
Methods for designing multifunctional, machine readable systematic review and map databases

In softloud/sysrevdata: Methods for designing multifunctional, machine readable systematic review and map databases

creating a narrative synthesis table

condensing from wide data

condensing from long data

formatting for output

R Package Documentation

Browse R Packages

We want your feedback!

softloud/sysrevdata Methods for designing multifunctional, machine readable systematic review and map databases

In softloud/sysrevdata: Methods for designing multifunctional, machine readable systematic review and map databases

creating a narrative synthesis table

condensing from wide data

condensing from long data

formatting for output

R Package Documentation

Browse R Packages

We want your feedback!

softloud/sysrevdata
Methods for designing multifunctional, machine readable systematic review and map databases