knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
sched helps sending SOAP or regular requests to web servers, while respecting a maximum requesting frequency, as stated by web sites for the usage of their web services.
sched uses fscache package to store returned contents of requests, reusing them automatically when the same request is run again.
Requests are sent through the use of an instance of the Scheduler
class.
To get an instance of a scheduler, we use the Scheduler
class as following:
scheduler <- sched::Scheduler$new(cache_dir = NULL, user_agent = "sched ; pierrick.roger@cea.fr")
Be sure to set a user agent, since this is what will identify your application to the web site. Some web site may reject requests because of an empty user agent.
For this vignette we disable the cache folder by setting cache_dir
to NULL
.
By default it is set to sched
folder inside the default user cache folder on the
system. It is however strongly recommended to set it to a folder named after your
application. Example:
sched::Scheduler$new(cache_dir=tools::R_user_dir("my.app", which = "cache"))
.
To send a request to a web service and retrieve the content of the response, we
use the sendRequest()
method.
Inside sendRequest()
, the scheduler will automatically limit the access
frequency to the domain name. This means that the call to sendRequest()
may
block sometime, doing nothing. This is perfectly normal.
Before sending a request we must build a Request
object that we will pass to
sendRequest()
.
Using classes like Request
and URL
may be cumbersome for basic requests,
but is very handy for more complex ones, like POST requests.
Let us a build a URL
object and a simple Request
object that takes only a
URL:
my_url <- sched::URL$new( url = "https://www.ebi.ac.uk/webservices/chebi/2.0/test/getCompleteEntity", params = c(chebiId = 15440) ) my_request <- sched::Request$new(my_url)
To send the request, pass the Request
object to the sendRequest()
method:
content <- scheduler$sendRequest(my_request)
Here is the XML content returned by the ChEBI web service:
content
For building a POST request, see the documentation of the Request
class.
If no scheduling rule exists for a host name, sched uses a default rule of
three requests per second (this default frequency may be changed when creating
the Scheduler
instance).
To define a custom rule for a host name, use the setRule()
method:
scheduler$setRule("www.ebi.ac.uk", n = 7, lap = 2)
This call defines a new rule for domain www.ebi.ac.uk, that limits the number of request to 7 every 2 seconds. Note that the time lap is a sliding window, and sched registers the time of the requests. So supposing 7 requests have already been run during the 2 seconds, the 8th request will be blocked, but only until the first one becomes 2 seconds old.
To delete all defined rules, even the ones created automatically by sched, run:
scheduler$deleteRules()
With sched it is also possible to download file directly from URLs and write them to disk.
For this demonstration, we will use a destination folder:
my_temp_dir <- file.path(tempdir(), "my_temp_folder_for_sched_vignette")
To download a file from a URL and write it directly on disk, use the
downloadFile()
method:
my_url <- sched::URL$new( "https://gitlab.com/cnrgh/databases/r-sched/-/raw/main/README.md" ) dst <- file.path(my_temp_dir, "readme.md") scheduler$downloadFile(my_url, dest_file = dst)
As with the sendRequest()
method, the scheduler will use rules to limit
access frequency to the domain name.
Removal of the temporary folder:
unlink(my_temp_dir, recursive = TRUE)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.