README.md

Travis build
status R-CMD-check

PubmedMTK

An R package for querying the PubMed database & parsing retrieved records. Toolkit facilitates batch API requests & the creation of custom corpora for NLP.

Installation

You can download the development version from GitHub with:

devtools::install_github("jaytimm/PubmedMTK")

Usage

PubMed search

The pmtk_search_pubmed() function is meant for record-matching searches typically performed using the PubMed online interface. The search_term parameter specifies the query term; the fields parameter can be used to specify which fields to query.

s0 <- PubmedMTK::pmtk_search_pubmed(search_term = 'medical marijuana', 
                                    fields = c('TIAB','MH'))
## [1] "medical marijuana[TIAB] OR medical marijuana[MH]: 2584 records"

Sample output:

head(s0)
##          search_term     pmid
## 1: medical marijuana 35724355
## 2: medical marijuana 35723729
## 3: medical marijuana 35712276
## 4: medical marijuana 35711271
## 5: medical marijuana 35702401
## 6: medical marijuana 35667225

Multiple search terms

ps <- PubmedMTK::pmtk_search_pubmed(
  search_term = c('political ideology',
                  'marijuana legalization',
                  'political theory',
                  'medical marijuana'),
  fields = c('TIAB','MH'))
## [1] "political ideology[TIAB] OR political ideology[MH]: 567 records"
## [1] "marijuana legalization[TIAB] OR marijuana legalization[MH]: 238 records"
## [1] "political theory[TIAB] OR political theory[MH]: 119 records"
## [1] "medical marijuana[TIAB] OR medical marijuana[MH]: 2584 records"

The pmtk_crosstab_query can be used to build a cross-tab of PubMed search results for multiple search terms.

ps0 <- PubmedMTK::pmtk_crosstab_query(x = ps) 

ps0 %>% knitr::kable()

| term1 | term2 | n1 | n2 | n1n2 | |:-----------------------|:-------------------|-----:|-----:|-----:| | marijuana legalization | medical marijuana | 238 | 2584 | 89 | | marijuana legalization | political ideology | 238 | 567 | 1 | | marijuana legalization | political theory | 238 | 119 | 0 | | medical marijuana | political ideology | 2584 | 567 | 2 | | medical marijuana | political theory | 2584 | 119 | 1 | | political ideology | political theory | 567 | 119 | 2 |

Retrieve and parse abstract data

For quicker abstract retrieval, be sure to get an API key.

sen_df <- PubmedMTK::pmtk_get_records2(pmids = unique(s0$pmid), 
                                       with_annotations = T,
                                       cores = 5, 
                                       ncbi_key = key) 

Sample record from output:

sen_df <- data.table::rbindlist(sen_df)

n <- 10
list(pmid = sen_df$pmid[n],
     year = sen_df$year[n],
     journal = sen_df$journal[n],
     articletitle = strwrap(sen_df$articletitle[n], width = 60),
     abstract = strwrap(sen_df$abstract[n], width = 60)[1:10])
## $pmid
## [1] "34888981"
## 
## $year
## [1] "2022"
## 
## $journal
## [1] "Addiction (Abingdon, England)"
## 
## $articletitle
## [1] "Coordinating cannabis data collection globally: Policy"
## [2] "implications."                                         
## 
## $abstract
##  [1] "NA" NA   NA   NA   NA   NA   NA   NA   NA   NA

Annotations

Annotations are included as a list-column, and can be easily extracted:

annotations <- data.table::rbindlist(sen_df$annotations)
annotations %>%
  filter(!is.na(Form)) %>%
  slice(1:10) %>%
  knitr::kable()

| pmid | Type | Form | |:---------|:----------|:--------------------| | 35723729 | Keyword | Cancer | | 35723729 | Keyword | Medical marijuana | | 35723729 | Keyword | Palliative medicine | | 35723729 | Keyword | Sex characteristics | | 35723729 | Keyword | Symptom burden | | 35712276 | MeSH | Cannabis | | 35712276 | MeSH | Medical Marijuana | | 35712276 | MeSH | Policy | | 35712276 | Chemistry | Medical Marijuana | | 35712276 | Keyword | France |

Citation data

The pmtk_get_icites function can be used to obtain citation data per PMID using NIH’s Open Citation Collection and iCite.

Hutchins BI, Baker KL, Davis MT, Diwersy MA, Haque E, Harriman RM, et al. (2019) The NIH Open Citation Collection: A public access, broad coverage resource. PLoS Biol 17(10): e3000385. https://doi.org/10.1371/journal.pbio.3000385

The iCite API returns a host of descriptive/derived citation details per record.

citations <- PubmedMTK::pmtk_get_icites(pmids = ps$pmid, 
                                        cores = 6,
                                        ncbi_key = key)

citations %>% select(-citation_net) %>%
  slice(4) %>%
  t() %>% data.frame() %>%
  knitr::kable()

| | . | |:----------------------------|:-------------------------------------------------------------------------------------------| | pmid | 32405082 | | year | 2020 | | title | Who Owns a Handgun? An Analysis of the Correlates of Handgun Ownership in Young Adulthood. | | authors | Mitchell Gresham, Stephen Demuth | | journal | Crime Delinq | | is_research_article | Yes | | relative_citation_ratio | 0.46 | | nih_percentile | 25.1 | | human | 1 | | animal | 0 | | molecular_cellular | 0 | | apt | 0.5 | | is_clinical | No | | citation_count | 2 | | citations_per_year | 1 | | expected_citations_per_year | 2.155849 | | field_citation_rate | 4.778137 | | provisional | Yes | | x_coord | 0 | | y_coord | 1 | | cited_by_clin | | | doi | 10.1177/0011128719847457 | | ref_count | 5 |

Referenced and cited-by PMIDs are returned by the function as a column-list of network edges.

citations$citation_net[[4]]
##        from       to
## 1: 32405082 25733742
## 2: 32405082 14713704
## 3: 32405082 21767021
## 4: 32405082 17296683
## 5: 32405082 28018135
## 6: 34256608 32405082
## 7: 33538822 32405082

Affiliations

The pmtk_get_affiliations function extracts author and author affiliation information from PubMed records.

afffs <- PubmedMTK::pmtk_get_affiliations(pmids = s0$pmid)

afffs %>%
  bind_rows() %>%
  slice(1:10) %>%
  knitr::kable()

| pmid | Author | Affiliation | |:---------|:------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------| | 35724355 | Daniel, Jeremy | College of Pharmacy and Allied Health Professions, South Dakota State University, Brookings, South Dakota. | | 35724355 | Daniel, Jeremy | Avera Behavioral Health, Sioux Falls, South Dakota. | | 35723729 | Kasvis, Popi | McGill Nutrition and Performance Laboratory, 5252 de Maisonneuve Blvd West, Suite 105-B, Montreal, QC, H4A 3S5, Canada. | | 35723729 | Kasvis, Popi | Supportive and Palliative Care Division, McGill University Health Centre, 1001 Decarie Boulevard, Montreal, QC, H4A 3J1, Canada. | | 35723729 | Kasvis, Popi | Department of Health, Kinesiology and Applied Physiology, Concordia University, 7141 Sherbrooke Street West, Montreal, QC, H4B 1R6, Canada. | | 35723729 | Canac-Marquis, Michelle | Research Institute of the McGill University Health Centre, 1001 Decarie Boulevard, Montreal, QC, H4A 3J1, Canada. | | 35723729 | Aprikian, Saro | School of Medicine, Royal College of Surgeons in Ireland, 123 St. Stephen’s Green Dublin 2, Dublin, Ireland. | | 35723729 | Vigano, MariaLuisa | Department of Science, McGill University, 845 Sherbrooke St W, Montreal, QC, H3A 0G4, Canada. | | 35723729 | Vigano, Antonio | McGill Nutrition and Performance Laboratory, 5252 de Maisonneuve Blvd West, Suite 105-B, Montreal, QC, H4A 3S5, Canada. antonio. | | 35723729 | Vigano, Antonio | Supportive and Palliative Care Division, McGill University Health Centre, 1001 Decarie Boulevard, Montreal, QC, H4A 3J1, Canada. antonio. |



jaytimm/PubmedMTK documentation built on Sept. 25, 2022, 10:49 p.m.