An R package for querying the PubMed database & parsing retrieved records. Toolkit facilitates batch API requests & the creation of custom corpora for NLP.
You can download the development version from GitHub with:
devtools::install_github("jaytimm/PubmedMTK")
The pmtk_search_pubmed()
function is meant for record-matching
searches typically performed using the PubMed online
interface. The search_term
parameter specifies the query term; the fields
parameter can be used
to specify which fields to query.
s0 <- PubmedMTK::pmtk_search_pubmed(search_term = 'medical marijuana',
fields = c('TIAB','MH'))
## [1] "medical marijuana[TIAB] OR medical marijuana[MH]: 2584 records"
Sample output:
head(s0)
## search_term pmid
## 1: medical marijuana 35724355
## 2: medical marijuana 35723729
## 3: medical marijuana 35712276
## 4: medical marijuana 35711271
## 5: medical marijuana 35702401
## 6: medical marijuana 35667225
ps <- PubmedMTK::pmtk_search_pubmed(
search_term = c('political ideology',
'marijuana legalization',
'political theory',
'medical marijuana'),
fields = c('TIAB','MH'))
## [1] "political ideology[TIAB] OR political ideology[MH]: 567 records"
## [1] "marijuana legalization[TIAB] OR marijuana legalization[MH]: 238 records"
## [1] "political theory[TIAB] OR political theory[MH]: 119 records"
## [1] "medical marijuana[TIAB] OR medical marijuana[MH]: 2584 records"
The pmtk_crosstab_query
can be used to build a cross-tab of PubMed
search results for multiple search terms.
ps0 <- PubmedMTK::pmtk_crosstab_query(x = ps)
ps0 %>% knitr::kable()
| term1 | term2 | n1 | n2 | n1n2 | |:-----------------------|:-------------------|-----:|-----:|-----:| | marijuana legalization | medical marijuana | 238 | 2584 | 89 | | marijuana legalization | political ideology | 238 | 567 | 1 | | marijuana legalization | political theory | 238 | 119 | 0 | | medical marijuana | political ideology | 2584 | 567 | 2 | | medical marijuana | political theory | 2584 | 119 | 1 | | political ideology | political theory | 567 | 119 | 2 |
For quicker abstract retrieval, be sure to get an API key.
sen_df <- PubmedMTK::pmtk_get_records2(pmids = unique(s0$pmid),
with_annotations = T,
cores = 5,
ncbi_key = key)
Sample record from output:
sen_df <- data.table::rbindlist(sen_df)
n <- 10
list(pmid = sen_df$pmid[n],
year = sen_df$year[n],
journal = sen_df$journal[n],
articletitle = strwrap(sen_df$articletitle[n], width = 60),
abstract = strwrap(sen_df$abstract[n], width = 60)[1:10])
## $pmid
## [1] "34888981"
##
## $year
## [1] "2022"
##
## $journal
## [1] "Addiction (Abingdon, England)"
##
## $articletitle
## [1] "Coordinating cannabis data collection globally: Policy"
## [2] "implications."
##
## $abstract
## [1] "NA" NA NA NA NA NA NA NA NA NA
Annotations are included as a list-column, and can be easily extracted:
annotations <- data.table::rbindlist(sen_df$annotations)
annotations %>%
filter(!is.na(Form)) %>%
slice(1:10) %>%
knitr::kable()
| pmid | Type | Form | |:---------|:----------|:--------------------| | 35723729 | Keyword | Cancer | | 35723729 | Keyword | Medical marijuana | | 35723729 | Keyword | Palliative medicine | | 35723729 | Keyword | Sex characteristics | | 35723729 | Keyword | Symptom burden | | 35712276 | MeSH | Cannabis | | 35712276 | MeSH | Medical Marijuana | | 35712276 | MeSH | Policy | | 35712276 | Chemistry | Medical Marijuana | | 35712276 | Keyword | France |
The pmtk_get_icites
function can be used to obtain citation data per
PMID using NIH’s Open Citation Collection and
iCite.
Hutchins BI, Baker KL, Davis MT, Diwersy MA, Haque E, Harriman RM, et al. (2019) The NIH Open Citation Collection: A public access, broad coverage resource. PLoS Biol 17(10): e3000385. https://doi.org/10.1371/journal.pbio.3000385
The iCite API returns a host of descriptive/derived citation details per record.
citations <- PubmedMTK::pmtk_get_icites(pmids = ps$pmid,
cores = 6,
ncbi_key = key)
citations %>% select(-citation_net) %>%
slice(4) %>%
t() %>% data.frame() %>%
knitr::kable()
| | . | |:----------------------------|:-------------------------------------------------------------------------------------------| | pmid | 32405082 | | year | 2020 | | title | Who Owns a Handgun? An Analysis of the Correlates of Handgun Ownership in Young Adulthood. | | authors | Mitchell Gresham, Stephen Demuth | | journal | Crime Delinq | | is_research_article | Yes | | relative_citation_ratio | 0.46 | | nih_percentile | 25.1 | | human | 1 | | animal | 0 | | molecular_cellular | 0 | | apt | 0.5 | | is_clinical | No | | citation_count | 2 | | citations_per_year | 1 | | expected_citations_per_year | 2.155849 | | field_citation_rate | 4.778137 | | provisional | Yes | | x_coord | 0 | | y_coord | 1 | | cited_by_clin | | | doi | 10.1177/0011128719847457 | | ref_count | 5 |
Referenced and cited-by PMIDs are returned by the function as a column-list of network edges.
citations$citation_net[[4]]
## from to
## 1: 32405082 25733742
## 2: 32405082 14713704
## 3: 32405082 21767021
## 4: 32405082 17296683
## 5: 32405082 28018135
## 6: 34256608 32405082
## 7: 33538822 32405082
The pmtk_get_affiliations
function extracts author and author
affiliation information from PubMed records.
afffs <- PubmedMTK::pmtk_get_affiliations(pmids = s0$pmid)
afffs %>%
bind_rows() %>%
slice(1:10) %>%
knitr::kable()
| pmid | Author | Affiliation | |:---------|:------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------| | 35724355 | Daniel, Jeremy | College of Pharmacy and Allied Health Professions, South Dakota State University, Brookings, South Dakota. | | 35724355 | Daniel, Jeremy | Avera Behavioral Health, Sioux Falls, South Dakota. | | 35723729 | Kasvis, Popi | McGill Nutrition and Performance Laboratory, 5252 de Maisonneuve Blvd West, Suite 105-B, Montreal, QC, H4A 3S5, Canada. | | 35723729 | Kasvis, Popi | Supportive and Palliative Care Division, McGill University Health Centre, 1001 Decarie Boulevard, Montreal, QC, H4A 3J1, Canada. | | 35723729 | Kasvis, Popi | Department of Health, Kinesiology and Applied Physiology, Concordia University, 7141 Sherbrooke Street West, Montreal, QC, H4B 1R6, Canada. | | 35723729 | Canac-Marquis, Michelle | Research Institute of the McGill University Health Centre, 1001 Decarie Boulevard, Montreal, QC, H4A 3J1, Canada. | | 35723729 | Aprikian, Saro | School of Medicine, Royal College of Surgeons in Ireland, 123 St. Stephen’s Green Dublin 2, Dublin, Ireland. | | 35723729 | Vigano, MariaLuisa | Department of Science, McGill University, 845 Sherbrooke St W, Montreal, QC, H3A 0G4, Canada. | | 35723729 | Vigano, Antonio | McGill Nutrition and Performance Laboratory, 5252 de Maisonneuve Blvd West, Suite 105-B, Montreal, QC, H4A 3S5, Canada. antonio. | | 35723729 | Vigano, Antonio | Supportive and Palliative Care Division, McGill University Health Centre, 1001 Decarie Boulevard, Montreal, QC, H4A 3J1, Canada. antonio. |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.