knitr::opts_chunk$set(collapse = FALSE)
library(dplyr);library(impactr)

Search Publication Data

 

1. Search Pubmed

The search_pubmed() function provides a focussed method

At present the search can be refined in 2 ways:

search <- impactr::search_pubmed(search_list = c("mclean ka", "ots r", "drake tm", "harrison em"),
                        date_min = "2018/01/01", date_max = "2020/05/01")

The output from search_pubmed() is a list of PMIDs resulting from this search (e.g. r length(search) in the above case).

 

2. Sift Pubmed Results

There may be 1000s of records which meet the search criteria specified

Note: This function

extract <- impactr::extract_pmid(pmid = search, get_authors = TRUE, get_altmetric = F, get_impact = F)
extract <- readr::read_rds( here::here("vignettes/extract_pmid.rds"))
sifted <- extract %>%
  sift(authors = c("mclean ka", "ots r", "drake tm", "harrison em"),
                affiliations = c("edinburgh", "lothian"),
                keyword = c("surg"))
head(sifted$wheat, 10)
head(sifted$chaff, 10)
relevance <- readr::read_csv(here::here("vignettes/sifted_relevance.csv"), show_col_types = F) %>%
  dplyr::mutate(var_id = as.character(var_id),
                relevance = factor(relevance) %>% forcats::fct_rev())

accuracy_check <- bind_rows(sifted$wheat %>% mutate(designation = "Wheat"),
                           sifted$chaff %>% mutate(designation = "Chaff")) %>%
  dplyr::left_join(relevance, by = "var_id") %>%
  dplyr::mutate(designation = factor(designation, levels = c("Wheat", "Chaff")))

Based on the above search of r length(search) identified using pubmed_search() there are:

However, it should be noted that all publications that met 2 or more criteria (n=nrow(sifted$wheat %>% dplyr::filter(criteria_met>1))) were relevant. The function has been designed with sensitivity in mind to maximise negative predictive value.

accuracy_table <- accuracy_check %>%
  dplyr::select(designation, relevance) %>%
  table()

accuracy_table

The accuracy of the search entirely depends upon the parameters used - 75.9% (n=22/29) would not have been highlighted if a keyword ("surg") was not used.

venn_data <- accuracy_check %>%
  dplyr::filter(relevance=="Yes") %>%
  dplyr::filter(designation=="Wheat") %>%
  dplyr::mutate(`Multiple Specified Authors` = ifelse(author_multi_n>1, 1, 0),
                `Affiliation (Specified Author)` = ifelse(affiliations_author=="Yes", 1, 0),
                `Affiliation (Any Author)` = ifelse(affiliations_any=="Yes", 1, 0),
                `Keyword` = ifelse(keyword=="Yes", 1, 0)) %>%
  dplyr::select(`Multiple Specified Authors`:`Keyword`) %>%
  impactr::format_intersect() %>%
  dplyr::select(-degree) %>%
  dplyr::filter(combination!="") %>% # patients who are recorded as asymptomatic (on these 3 variables)
  tidyr::pivot_wider(names_from = combination, values_from = n) %>%
  unlist()

grDevices::png(filename = here::here("vignettes/plot/venn_sift.png"),
               height = 3.6, width = 5.6, units = "in", res=300)

plot(eulerr::euler(venn_data),
     edges = list("black", alpha = 0.8),
     fill = list(alpha = 0.5),
     quantities = list(fontsize = 15), legend = list(alpha = 1))

dev.off()

Let's add some more parameters in to try to improve sensitivity. This will include a common coauthor ("wigmore sj"), and another common topic of publications ("liver").

sifted2 <- extract %>%
  sift(authors = c("mclean ka", "ots r", "drake tm", "harrison em", "wigmore sj"),
                affiliations = c("edinburgh", "lothian"),
                keyword = c("surg", "liver"))

Based on the above criteria:

bind_rows(sifted2$wheat %>% mutate(designation = "Wheat"),
                           sifted2$chaff %>% mutate(designation = "Chaff")) %>%
  dplyr::left_join(relevance, by = "var_id") %>%
  dplyr::mutate(designation = factor(designation, levels = c("Wheat", "Chaff"))) %>%
  dplyr::select(designation, relevance) %>%
  table()


kamclean/impactr documentation built on Jan. 11, 2023, 2:51 p.m.