An R package for querying the PubMed database & parsing retrieved records. Toolkit facilitates batch API requests & the creation of custom corpora for NLP.
You can download the development version from GitHub with:
library(tidyverse)
devtools::install_github("jaytimm/PubmedMTK")
The pmtk_search_pubmed()
function is meant for record-matching searches typically performed using the PubMed online interface. The search_term
parameter specifies the query term; the fields
parameter can be used to specify which fields to query.
s0 <- PubmedMTK::pmtk_search_pubmed(search_term = 'medical marijuana', fields = c('TIAB','MH'))
Sample output:
head(s0)
ps <- PubmedMTK::pmtk_search_pubmed( search_term = c('political ideology', 'marijuana legalization', 'political theory', 'medical marijuana'), fields = c('TIAB','MH'))
The pmtk_crosstab_query
can be used to build a cross-tab of PubMed search results for multiple search terms.
ps0 <- PubmedMTK::pmtk_crosstab_query(x = ps) ps0 %>% knitr::kable()
For quicker abstract retrieval, be sure to get an API key.
# Set NCBI API key key <- '4f47f85a9cc03c4031b3dc274c2840b06108' #rentrez::set_entrez_key(key) ## LIST of xml elements -- # https://www.nlm.nih.gov/bsd/licensee/elements_alphabetical.html
sen_df <- PubmedMTK::pmtk_get_records2(pmids = unique(s0$pmid), with_annotations = T, cores = 5, ncbi_key = key)
Sample record from output:
sen_df <- data.table::rbindlist(sen_df) n <- 10 list(pmid = sen_df$pmid[n], year = sen_df$year[n], journal = sen_df$journal[n], articletitle = strwrap(sen_df$articletitle[n], width = 60), abstract = strwrap(sen_df$abstract[n], width = 60)[1:10])
Annotations are included as a list-column, and can be easily extracted:
annotations <- data.table::rbindlist(sen_df$annotations)
annotations %>% filter(!is.na(Form)) %>% slice(1:10) %>% knitr::kable()
The pmtk_get_icites
function can be used to obtain citation data per PMID using NIH's Open Citation Collection and iCite.
Hutchins BI, Baker KL, Davis MT, Diwersy MA, Haque E, Harriman RM, et al. (2019) The NIH Open Citation Collection: A public access, broad coverage resource. PLoS Biol 17(10): e3000385. https://doi.org/10.1371/journal.pbio.3000385
The iCite API returns a host of descriptive/derived citation details per record.
citations <- PubmedMTK::pmtk_get_icites(pmids = ps$pmid, cores = 6, ncbi_key = key) citations %>% select(-citation_net) %>% slice(4) %>% t() %>% data.frame() %>% knitr::kable()
Referenced and cited-by PMIDs are returned by the function as a column-list of network edges.
citations$citation_net[[4]]
The pmtk_get_affiliations
function extracts author and author affiliation information from PubMed records.
afffs <- PubmedMTK::pmtk_get_affiliations(pmids = s0$pmid) afffs %>% bind_rows() %>% slice(1:10) %>% knitr::kable()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.