knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Most users of ChemSpider's API services will have to apply rate limiting conditions. Such a rate limit could be, for example, "no more than 15 queries per minute". The following two paragraphs describe how to rate-limit any chemspiderapi function; both in base R (using ratelimitr) and tidyverse coding styles.^[Technically, using the purrr package would be sufficient, but in practice most users will likely use the entire tidyverse.]

Installing necessary packages

To run both examples, the ratelimitr package and the tidyverse package collection need to be installed. The necessary API key is retrieved from a stored keyring using the keyring package; see separate vignette "Storing and Accessing API Keys" for instructions and recomendations on how to handle system-wide keys.

install.packages(c("ratelimitr", "tidyverse"))

Loading demo data

To start working with rate-limiting, the miniature demo data from chemspiderapi will be used. The data set is immediately available once chemspiderapi is loaded.

## Loading chemspiderapi
library(chemspiderapi)

## Loading the demo data
data("demo_chemicals")

## Looking at the demo data
demo_chemicals

A base R example

In base R, chemspiderapi functions can be re-wrapped into functions that introduce a waiting time using Sys.sleep(). These rate-limited functions can then be used in, for example, the apply() family . Before diving into the rate-limiting, the API key needs to be retrieved.

## Retrieving the API key from the keyring package.
## Note that the keyring package is not loaded, but only the keyring::key_get() function is called.
apikey <- keyring::key_get(service = "ChemSpider API key", username = Sys.getenv("USERNAME"))

We will now create a rate-limited version of the post_inchikey() function and use it to obtain a query ID.

## The sleep_post_inchickey() function will automatically "sleep" for four seconds before returning its result.
## This means it will run each query with a four second sleep in between, resulting in 15 queries per minute.
sleepy_post_inchikey <- function(...) {
  result <- post_inchikey(...)
  Sys.sleep(4)
  result
}

## sapply() can be used in this context because the result of each iteration is a single vector.
demo_chemicals$queryId <- sapply(X = demo_chemicals$InChIKey, FUN = function(x) sleepy_post_inchikey(inchikey = x, apikey = apikey))

demo_chemicals

We can now check the status of query with a rate-limited get_queryId_status() function.

sleepy_get_queryId_status <- function(...) {
  result <- get_queryId_status(...)
  Sys.sleep(4)
  result
}

## As the result of the query is an array, lapply() is used.
demo_chemicals$status <- lapply(X = demo_chemicals$queryId, FUN = function(x) sleepy_get_queryId_status(queryId = x, apikey = apikey))

demo_chemicals

All queries need to report the status Complete before proceeding. If this is the case, the ChemSpider IDs for the queries can be retrieved using a "sleepy" get_queryId_results() function.

sleepy_get_queryId_results <- function(...) {
  result <- get_queryId_results(...)
  Sys.sleep(4)
  result
}

## mapply() is used because the input covers two columns
demo_chemicals$id <- mapply(FUN = function(x, y) sleepy_get_queryId_results(queryId = x, status = unlist(y)[1], apikey = apikey), x = demo_chemicals$queryId, y = demo_chemicals$status)

demo_chemicals

Finally, the details for the chemicals can be retrieved using a rate-limited get_recordId_details() function.

sleepy_get_recordId_details <- function(...) {
  result <- get_recordId_details(...)
  Sys.sleep(4)
  result
}

demo_chemicals$details <- lapply(X = demo_chemicals$id, FUN = function(x) sleepy_get_recordId_details(recordId = x, apikey = apikey, id = FALSE))

demo_chemicals

A ratelimitr example

In this example, the ratelimitr package will be used to rate-limit chemspiderapi functions. To use the rate-limiting functionalities, we need to load ratelimitr

## Loading the ratelimitr package
library(ratelimitr)

## Re-setting the demo data
data("demo_chemicals")

## Note that the API key is already available from the previous example!

We will now create a rate-limited version of the post_inchikey() function and use it to obtain a query ID.

## In ratelimitr::limit_rate(), the ratelimitr::rate() function specifies the rate limit as calls per second. 
limit_rate_post_inchikey <- limit_rate(post_inchikey, rate(n = 15, period = 60))

## sapply() can be used in this context because the result of each iteration is a single vector.
demo_chemicals$queryId <- sapply(X = demo_chemicals$InChIKey, FUN = function(x) limit_rate_post_inchikey(inchikey = x, apikey = apikey))

demo_chemicals

We can now check the status of query with a rate-limited get_queryId_status() function.

limit_rate_get_queryId_status <- limit_rate(get_queryId_status, rate(n = 15, period = 60))

## As the result of the query is an array, lapply() is used.
demo_chemicals$status <- lapply(X = demo_chemicals$queryId, FUN = function(x) limit_rate_get_queryId_status(queryId = x, apikey = apikey))

demo_chemicals

All queries need to report the status Complete before proceeding. If this is the case, the ChemSpider IDs for the queries can be retrieved using a rate-limited get_queryId_results() function.

limit_rate_get_queryId_results <- limit_rate(get_queryId_results, rate(n = 15, period = 60))

## mapply() is used because the input covers two columns
demo_chemicals$id <- mapply(FUN = function(x, y) limit_rate_get_queryId_results(queryId = x, status = unlist(y)[1], apikey = apikey), x = demo_chemicals$queryId, y = demo_chemicals$status)

demo_chemicals

Finally, the details for the chemicals can be retrieved using a rate-limited get_recordId_details() function.

limit_rate_get_recordId_details <- limit_rate(get_recordId_details, rate(n = 15, period = 60))

demo_chemicals$details <- lapply(X = demo_chemicals$id, FUN = function(x) limit_rate_get_recordId_details(recordId = x, apikey = apikey, id = FALSE))

demo_chemicals

A tidyverse example

In the tidyverse ecosystem, all relevant chemspiderapi functions can be used in functional programming approaches using the purrr::map() family. purrr::slowly() can be used to limit the number of calls of chemspiderapi functions in a given time interval.

## Loading the tidyverse package family
library(tidyverse)

## Re-setting the demo data
data("demo_chemicals")

## Note that the API key is already available from the previous example!

## Using purrr::slowly() and purrr::rate_delay() to rate-limit the post_inchikey() function.
## In this example, we introduce a pause of four seconds and use the verbose output (`quiet = FALSE`).
slowly_post_inchikey <- slowly(post_inchikey, rate = rate_delay(pause = 4), quiet = FALSE)

The post_inchikey() function is now rate-limited as slowly_post_inchikey(). We can now use it to POST the five InChIKeys of the demo data.

tidyverse_demo <- demo_chemicals %>% 
  as_tibble() %>% 
  mutate(queryId = map_chr(InChIKey, ~ slowly_post_inchikey(inchikey = .x, apikey = apikey)))

tidyverse_demo

In the next step, the queryId status needs to be checked using the get_queryId_status() function. This function as well needs to be rate-limited into slowly_get_queryId_status() before use.

slowly_get_queryId_status <- slowly(get_queryId_status, rate = rate_delay(pause = 4), quiet = FALSE)

tidyverse_demo <- tidyverse_demo %>% 
  mutate(status = map(queryId, ~ slowly_get_queryId_status(queryId = .x, apikey = apikey))) %>% 
  unnest(status)

tidyverse_demo

Ideally, the status column only contains Complete. If this is the case, the results can be retrieved using a rate-limited slowly_get_queryId_results() function.

slowly_get_queryId_results <- slowly(get_queryId_results, rate = rate_delay(pause = 4), quiet = FALSE)

tidyverse_demo <- tidyverse_demo %>% 
  mutate(id = map2_int(queryId, status, ~ slowly_get_queryId_results(queryId = .x, status = .y, apikey = apikey)))

tidyverse_demo

The id column contains the ChemSpider IDs for the five chemicals; or rather their InChIKeys. We can now obtain details for the chemicals using the rate-limited version of get_recordId_details().

slowly_get_recordId_details <- slowly(get_recordId_details, rate = rate_delay(pause = 4), quiet = FALSE)

tidyverse_demo <- tidyverse_demo %>% 
  mutate(details = map(id, ~ slowly_get_recordId_details(recordId = .x, apikey = apikey, id = FALSE))) %>% 
  unnest(details)

tidyverse_demo


NIVANorge/chemspiderapi documentation built on Jan. 10, 2021, 10:12 a.m.