knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Most users of ChemSpider's API services will have to apply rate limiting conditions. Such a rate limit could be, for example, "no more than 15 queries per minute". The following two paragraphs describe how to rate-limit any chemspiderapi
function; both in base
R (using ratelimitr
) and tidyverse
coding styles.^[Technically, using the purrr
package would be sufficient, but in practice most users will likely use the entire tidyverse
.]
To run both examples, the ratelimitr
package and the tidyverse
package collection need to be installed. The necessary API key is retrieved from a stored keyring using the keyring
package; see separate vignette "Storing and Accessing API Keys" for instructions and recomendations on how to handle system-wide keys.
install.packages(c("ratelimitr", "tidyverse"))
To start working with rate-limiting, the miniature demo data from chemspiderapi
will be used. The data set is immediately available once chemspiderapi
is loaded.
## Loading chemspiderapi library(chemspiderapi) ## Loading the demo data data("demo_chemicals") ## Looking at the demo data demo_chemicals
base
R exampleIn base
R, chemspiderapi
functions can be re-wrapped into functions that introduce a waiting time using Sys.sleep()
. These rate-limited functions can then be used in, for example, the apply()
family . Before diving into the rate-limiting, the API key needs to be retrieved.
## Retrieving the API key from the keyring package. ## Note that the keyring package is not loaded, but only the keyring::key_get() function is called. apikey <- keyring::key_get(service = "ChemSpider API key", username = Sys.getenv("USERNAME"))
We will now create a rate-limited version of the post_inchikey()
function and use it to obtain a query ID.
## The sleep_post_inchickey() function will automatically "sleep" for four seconds before returning its result. ## This means it will run each query with a four second sleep in between, resulting in 15 queries per minute. sleepy_post_inchikey <- function(...) { result <- post_inchikey(...) Sys.sleep(4) result } ## sapply() can be used in this context because the result of each iteration is a single vector. demo_chemicals$queryId <- sapply(X = demo_chemicals$InChIKey, FUN = function(x) sleepy_post_inchikey(inchikey = x, apikey = apikey)) demo_chemicals
We can now check the status of query with a rate-limited get_queryId_status()
function.
sleepy_get_queryId_status <- function(...) { result <- get_queryId_status(...) Sys.sleep(4) result } ## As the result of the query is an array, lapply() is used. demo_chemicals$status <- lapply(X = demo_chemicals$queryId, FUN = function(x) sleepy_get_queryId_status(queryId = x, apikey = apikey)) demo_chemicals
All queries need to report the status Complete
before proceeding. If this is the case, the ChemSpider IDs for the queries can be retrieved using a "sleepy" get_queryId_results()
function.
sleepy_get_queryId_results <- function(...) { result <- get_queryId_results(...) Sys.sleep(4) result } ## mapply() is used because the input covers two columns demo_chemicals$id <- mapply(FUN = function(x, y) sleepy_get_queryId_results(queryId = x, status = unlist(y)[1], apikey = apikey), x = demo_chemicals$queryId, y = demo_chemicals$status) demo_chemicals
Finally, the details for the chemicals can be retrieved using a rate-limited get_recordId_details()
function.
sleepy_get_recordId_details <- function(...) { result <- get_recordId_details(...) Sys.sleep(4) result } demo_chemicals$details <- lapply(X = demo_chemicals$id, FUN = function(x) sleepy_get_recordId_details(recordId = x, apikey = apikey, id = FALSE)) demo_chemicals
ratelimitr
exampleIn this example, the ratelimitr
package will be used to rate-limit chemspiderapi
functions. To use the rate-limiting functionalities, we need to load ratelimitr
## Loading the ratelimitr package library(ratelimitr) ## Re-setting the demo data data("demo_chemicals") ## Note that the API key is already available from the previous example!
We will now create a rate-limited version of the post_inchikey()
function and use it to obtain a query ID.
## In ratelimitr::limit_rate(), the ratelimitr::rate() function specifies the rate limit as calls per second. limit_rate_post_inchikey <- limit_rate(post_inchikey, rate(n = 15, period = 60)) ## sapply() can be used in this context because the result of each iteration is a single vector. demo_chemicals$queryId <- sapply(X = demo_chemicals$InChIKey, FUN = function(x) limit_rate_post_inchikey(inchikey = x, apikey = apikey)) demo_chemicals
We can now check the status of query with a rate-limited get_queryId_status()
function.
limit_rate_get_queryId_status <- limit_rate(get_queryId_status, rate(n = 15, period = 60)) ## As the result of the query is an array, lapply() is used. demo_chemicals$status <- lapply(X = demo_chemicals$queryId, FUN = function(x) limit_rate_get_queryId_status(queryId = x, apikey = apikey)) demo_chemicals
All queries need to report the status Complete
before proceeding. If this is the case, the ChemSpider IDs for the queries can be retrieved using a rate-limited get_queryId_results()
function.
limit_rate_get_queryId_results <- limit_rate(get_queryId_results, rate(n = 15, period = 60)) ## mapply() is used because the input covers two columns demo_chemicals$id <- mapply(FUN = function(x, y) limit_rate_get_queryId_results(queryId = x, status = unlist(y)[1], apikey = apikey), x = demo_chemicals$queryId, y = demo_chemicals$status) demo_chemicals
Finally, the details for the chemicals can be retrieved using a rate-limited get_recordId_details()
function.
limit_rate_get_recordId_details <- limit_rate(get_recordId_details, rate(n = 15, period = 60)) demo_chemicals$details <- lapply(X = demo_chemicals$id, FUN = function(x) limit_rate_get_recordId_details(recordId = x, apikey = apikey, id = FALSE)) demo_chemicals
tidyverse
exampleIn the tidyverse
ecosystem, all relevant chemspiderapi
functions can be used in functional programming approaches using the purrr::map()
family. purrr::slowly()
can be used to limit the number of calls of chemspiderapi
functions in a given time interval.
## Loading the tidyverse package family library(tidyverse) ## Re-setting the demo data data("demo_chemicals") ## Note that the API key is already available from the previous example! ## Using purrr::slowly() and purrr::rate_delay() to rate-limit the post_inchikey() function. ## In this example, we introduce a pause of four seconds and use the verbose output (`quiet = FALSE`). slowly_post_inchikey <- slowly(post_inchikey, rate = rate_delay(pause = 4), quiet = FALSE)
The post_inchikey()
function is now rate-limited as slowly_post_inchikey()
. We can now use it to POST the five InChIKeys of the demo data.
tidyverse_demo <- demo_chemicals %>% as_tibble() %>% mutate(queryId = map_chr(InChIKey, ~ slowly_post_inchikey(inchikey = .x, apikey = apikey))) tidyverse_demo
In the next step, the queryId status needs to be checked using the get_queryId_status()
function. This function as well needs to be rate-limited into slowly_get_queryId_status()
before use.
slowly_get_queryId_status <- slowly(get_queryId_status, rate = rate_delay(pause = 4), quiet = FALSE) tidyverse_demo <- tidyverse_demo %>% mutate(status = map(queryId, ~ slowly_get_queryId_status(queryId = .x, apikey = apikey))) %>% unnest(status) tidyverse_demo
Ideally, the status column only contains Complete
. If this is the case, the results can be retrieved using a rate-limited slowly_get_queryId_results()
function.
slowly_get_queryId_results <- slowly(get_queryId_results, rate = rate_delay(pause = 4), quiet = FALSE) tidyverse_demo <- tidyverse_demo %>% mutate(id = map2_int(queryId, status, ~ slowly_get_queryId_results(queryId = .x, status = .y, apikey = apikey))) tidyverse_demo
The id column contains the ChemSpider IDs for the five chemicals; or rather their InChIKeys. We can now obtain details for the chemicals using the rate-limited version of get_recordId_details()
.
slowly_get_recordId_details <- slowly(get_recordId_details, rate = rate_delay(pause = 4), quiet = FALSE) tidyverse_demo <- tidyverse_demo %>% mutate(details = map(id, ~ slowly_get_recordId_details(recordId = .x, apikey = apikey, id = FALSE))) %>% unnest(details) tidyverse_demo
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.