library(ggplot2)
library(tidyr)
library(dplyr)
library(forcats)
library(rsr)
knitr::opts_chunk$set(collapse = TRUE,comment = "#>",dev="svg",fig.ext = "svg")

Sysrev users assign millions of labels to documents. The rsr package let's you access and analyze that data. This demo shows how to access data from a review of prostate cancer biomarkers sysrev.com/p/81395.

get_answers_list(81395) organizes data from the review (p/81395) into a list of tables.

tbls <- get_answers(81395) |> list_answers()
#> list(basic=<tbl>,biomarker=<tbl>,...)

The basic table links articles (eg. aid 1781738) to basic label data (logical, string, category). In this project, a basic label describes article trial identifiers (eg NCT01682772):

tbls$basic |> 
  mutate(nct=unlist(`NCT Trial ID`)) |> 
  select(aid,nct)
#>        aid nct        
#> 1 11781750 NCT01682772
#> 2 11781738 NCT02854436

The other named values in get_answer_list are group labels, which are themselves tables. Here, the biomarker table describes which trials evaluated which genes.

tbls$biomarker |> 
  mutate(biomarker.name=unlist(biomarker.name)) |> 
  select(aid, biomarker.name)
#>        aid biomarker.name
#> 1 11781738 BRCA1 
#> 2 11781738 BRCA2 

Extracted tables can be joined by article aid. Here, joined basic/biomarker tables link trials and biomarkers. ggplot2::geom_tile can then quickly visualize which trials use which biomarkers.

join.tb = tbls$basic |> 
  inner_join(tbls$biomarker,by="aid")

# ggplot(join.tb,
#   aes(x=bmkr, y=study, fill=elig)) + 
#   geom_tile() + …
joint = tbls$basic |> 
  inner_join(tbls$biomarker,by="aid") |> 
  select(aid,
         study = short_name,
         bmkr  = biomarker.name,
         eli   = eligibility) |> 
  purrr::modify(~ unlist(.,recursive = T))

top10 = joint |> group_by(bmkr) |> summarize(s = n_distinct(study)) |> 
  slice_max(n=13,order_by=s,with_ties = F)

ptb = joint |> 
  inner_join(top10,by="bmkr") |> 
  mutate(bmkr  = fct_rev(fct_infreq(bmkr)))  |> 
  mutate(study = fct_infreq(study)) |>
  complete(bmkr, study, fill=list(eli="none")) |> 
  mutate(eli = ifelse(eli=="sufficient","measured",eli)) |> 
  mutate(eli = factor(eli,levels=c("none","required_negative","measured"))) |> 
  mutate(eli = fct_recode(eli,
                          exclude          ="required_negative",
                          `measure/include`="measured"))

ggplot(ptb,aes(x=study, y=bmkr, fill=eli)) +
  geom_tile(col="white",size=0.5) + 
  scale_fill_manual(values=c("#161616","#4C9605","#CC2C11")) + 
  theme(text = element_text(size=12),
        panel.background = element_blank(),
        axis.text.x = element_text(angle=90,hjust=1), 
        legend.title = element_blank(),
        legend.position = "top") + 
  ylab("") + xlab("")

Now you know how to get open access review data on sysrev. Check out the rsr reference for more documentation.



sysrev/rsr documentation built on March 31, 2024, 6:47 a.m.