In cheneypinata/dslr: Converting Dimensions DSL API outputs to dataframes

knitr::opts_chunk$set(echo = TRUE)
source('/Users/sscribner/R/creds.R')

Basic functionality

dslr provides two main tools for accessing the Dimensions API:

dim_login() - a function that generates an API authorization token
dim_request() - a function that iteratively queries the API for data and returns a nested list object containing the data.

dim_login()

dslr provides a login function that generates an API authorization token and automatically stores it in the global environment. The three arguments are username, password, and authep (authorization endpoint). By default, authep is set to the mainstream Dimensions API address, but if you are running a private instance, you can change this argument to reflect that instead.

library(dslr)
#Logging in -- AKA getting a token
dim_login(username = creds$api_username, password = creds$api_password, authep = 'https://app.dimensions.ai/api/auth.json')

Successful token assignment will be confirmed with a message in the console and a token object appearing in your Global Environment.

dim_request()

API queries are run using dim_request which has several arguments to help with iterative querying around a limit of 50k records. Additionally, queries cannot contain "skip" or "limit". These should be specified with their respective arguments in the function call for the query to iterate dynamically.

Requests are broken into iterations at the outset of an API request. Iterations containing too much data will be automatically broken out into smaller chunks and this can extend the time it takes for an API pull to complete. Console outputs will inform you of how many iterations your API request has been broken into initially, and will also let you know when an iteration is too large and is being chunked.

q = 'search publications in title_abstract_only for "(microplastic* AND ocean*)" return publications[basics]'

pubs_result <- dim_request(dim_token, query = q)

Unpacking your JSON-Style Nested List Object

The tidyr and data.table packages have some useful tools to unpack nested data. A small example is shown below of unpacking publication and author data into separate dataframes that can also be combined later.

library(tidyverse)
library(data.table)

# Create dataframe of publication data
pubs_df <- rbindlist(pubs_result$data, use.names = TRUE) %>% select(-c('author_affiliations'))

# Unpack authors data
auths <- rbindlist(pubs_result$data, use.names = TRUE) %>% 
  select(c('id', 'author_affiliations'))

auths_df <- data.frame()

for(a in 1:nrow(auths)){

  if(is.null(auths[a, 2][[1]][[1]])){

    null_row <- data.frame(raw_affiliation = c(NA), 
                           first_name = c(NA), 
                           last_name = c(NA), 
                           corresponding = c(NA), 
                           orcid = c(NA), 
                           current_organization_id = c(NA), 
                           researcher_id = c(NA), 
                           affiliations = c(NA), 
                           id = c(auths[a, 1][[1]]))
    auths_df <- rbind(auths_df, null_row)}

  else{

    for(b in auths[a, 2][[1]]){

      c <- b[[1]]
      c$id <- auths[a, 1][[1]]
      auths_df <- rbind(auths_df, c)

    }
  }
}

rm(a, b , c, null_row)

API Function Functionality

dim_request also supports Dimensions API functions. Some examples below:

classify()

When the query is a classify function, dim_request will return a dataframe with the classification system tags.

q <- 'classify(title = \"Sustained Exposure to High Carbohydrate Availability Does Not Influence Iron-Regulatory Responses in Elite Endurance Athletes\",
abstract = \"This study implemented a 2-week high carbohydrate (CHO) diet intended to maximize CHO oxidation rates
and examined the iron-regulatory response to a 26-km race walking effort. Twenty international-level, male race
walkers were assigned to either a novel high CHO diet (MAX = 10 g/kg body mass CHO daily) inclusive of gut-training
strategies, or a moderate CHO control diet (CON = 6 g/kg body mass CHO daily) for a 2-week training period.
The athletes completed a 26-km race walking test protocol before and after the dietary intervention. Venous blood
samples were collected pre-, post-, and 3 hr postexercise and measured for serum ferritin, interleukin-6, and
hepcidin-25 concentrations. Similar decreases in serum ferritin (17-23%) occurred postintervention in MAX and CON.
At the baseline, CON had a greater postexercise increase in interleukin-6 levels after 26 km of walking
(20.1-fold, 95% CI [9.2, 35.7]) compared with MAX (10.2-fold, 95% CI [3.7, 18.7]). A similar finding was evident for
hepcidin levels 3 hr postexercise (CON = 10.8-fold, 95% CI [4.8, 21.2]; MAX = 8.8-fold, 95% CI [3.9, 16.4]).
Postintervention, there were no substantial differences in the interleukin-6 response (CON = 13.6-fold, 95% CI [9.2, 20.5]; MAX = 11.2-fold, 95% CI [6.5, 21.3]) or
hepcidin levels (CON = 7.1-fold, 95% CI [2.1, 15.4]; MAX = 6.3-fold, 95% CI [1.8, 14.6]) between the dietary groups.
Higher resting serum ferritin (p = .004) and hotter trial ambient temperatures (p = .014) were associated with greater
hepcidin levels 3 hr postexercise. Very high CHO diets employed by endurance athletes to increase CHO oxidation have
little impact on iron regulation in elite athletes. It appears that variations in serum ferritin concentration and
ambient temperature, rather than dietary CHO, are associated with increased hepcidin concentrations 3 hr postexercise.\",
system = \"RCDC\")'

classified <- dim_request(dim_token, query = q, logs = TRUE)

print(classified)

extract_affiliations()

In the case of extract_affiliations, dim_request will produce a dataframe with some basic matched organization information.

q <- 'extract_affiliations(affiliation = \"university of oxford\")'
affils <- dim_request(dim_token, query = q, logs = TRUE)

print(affils)

You can also match multiple affiliations

q <- 'extract_affiliations(json = [{\"affiliation\":\"Temple University\"}, {\"affiliation\":\"University of Pennsylvania\"}])'
affils <- dim_request(dim_token, query = q, logs = TRUE)

print(affils)

extract_concepts()

For concepts, if return_scores is not set to "true", dim_request will return a character vector of the returned concepts from the Dimensions API. Otherwise, it will return a dataframe of concepts and their scores.

q <- 'extract_concepts(\"This study implemented a 2-week high carbohydrate (CHO) diet intended to maximize CHO oxidation rates
and examined the iron-regulatory response to a 26-km race walking effort. Twenty international-level, male race
walkers were assigned to either a novel high CHO diet (MAX = 10 g/kg body mass CHO daily) inclusive of gut-training
strategies, or a moderate CHO control diet (CON = 6 g/kg body mass CHO daily) for a 2-week training period.
The athletes completed a 26-km race walking test protocol before and after the dietary intervention. Venous blood
samples were collected pre-, post-, and 3 hr postexercise and measured for serum ferritin, interleukin-6, and
hepcidin-25 concentrations. Similar decreases in serum ferritin (17-23%) occurred postintervention in MAX and CON.
At the baseline, CON had a greater postexercise increase in interleukin-6 levels after 26 km of walking
(20.1-fold, 95% CI [9.2, 35.7]) compared with MAX (10.2-fold, 95% CI [3.7, 18.7]). A similar finding was evident for
hepcidin levels 3 hr postexercise (CON = 10.8-fold, 95% CI [4.8, 21.2]; MAX = 8.8-fold, 95% CI [3.9, 16.4]).
Postintervention, there were no substantial differences in the interleukin-6 response (CON = 13.6-fold, 95% CI [9.2, 20.5]; MAX = 11.2-fold, 95% CI [6.5, 21.3]) or
hepcidin levels (CON = 7.1-fold, 95% CI [2.1, 15.4]; MAX = 6.3-fold, 95% CI [1.8, 14.6]) between the dietary groups.
Higher resting serum ferritin (p = .004) and hotter trial ambient temperatures (p = .014) were associated with greater
hepcidin levels 3 hr postexercise. Very high CHO diets employed by endurance athletes to increase CHO oxidation have
little impact on iron regulation in elite athletes. It appears that variations in serum ferritin concentration and
ambient temperature, rather than dietary CHO, are associated with increased hepcidin concentrations 3 hr postexercise.\",
return_scores = true)'
concepts <- dim_request(dim_token, query = q, logs = TRUE)

print(concepts)

extract_grants()

extract_grants can also be called with dim_request for individual grants to get a Dimensions grant ID returned in a character object.

q <- 'extract_grants(grant_number= \"R01HL117329\", funder_name=\"NIH\")'
ext_grants <- dim_request(dim_token, query = q, logs = TRUE)

print(ext_grants)