Scheduler | R Documentation |
Class for scheduling web requests.
Class for scheduling web requests.
The Scheduler class controls the frequency of access to web sites, through
the definiton of access rules (Rule
class).
It handles GET and POST requests, as well as file downloading.
It can use a cache system to store request results and avoid resending
identical requests.
new()
New instance initializer.
There should be only one Scheduler instance in an application. There is no sense in having two or more instances, since they will ignore each other and break the access frequency rules when they contact the same sites.
Scheduler$new( default_rule = Rule$new(), ssl_verifypeer = TRUE, nb_max_tries = 10L, cache_dir = tools::R_user_dir("sched", which = "cache"), user_agent = NULL, dwnld_timeout = 3600 )
default_rule
The default_rule to use when none has been defined for a site.
ssl_verifypeer
If set to TRUE (default), SSL certificate will be checked, otherwise certificates will be ignored.
nb_max_tries
Maximum number of tries when running a request.
cache_dir
Set the path to the file system cache. Set to NULL to disable the cache system. The cache system will save downloaded content and reuse it later for identical requests.
user_agent
The application name and contact address to send to the contacted web server.
dwnld_timeout
The timeout used by downloadFile()
method, in
seconds.
Nothing.
# Create a scheduler instance with a custom default_rule scheduler <- sched::Scheduler$new(default_rule=sched::Rule$new(10, 1), cache_dir = NULL)
setRule()
Defines a rule for a site.
Defines a rule for a site. The site is identified by its hostname. Each time a request will be made to this host (i.e.: the URL contains the defined hostname), the scheduling rule will be applied in order to wait (sleep) if nedeed before sending the request.
If a rule already exists for this hostname, it will be replaced.
Scheduler$setRule(host, n = 3L, lap = 1)
host
The hostname of the site.
n
Number of events during a time lap.
lap
Duration of a time lap, in seconds.
Nothing.
# Create a scheduler instance scheduler <- sched::Scheduler$new(cache_dir = NULL) # Define a rule with default values scheduler$setRule('www.ebi.ac.uk') # Define a rule with custome values scheduler$setRule('my.other.site', n=10, lap=3)
sendRequest()
Sends a request, and retrieves content result.
Scheduler$sendRequest(request, cache_read = TRUE)
request
A sched::Request
instance.
cache_read
If set to TRUE and the cache system is enabled, the cache system will be searched for the request and the cached result returned. In any case, if the the cache system is enabled, and the request sent, the retrieved content will be stored into the cache.
The results returned by the contacted server, as a single string value.
# Create a scheduler instance scheduler <- sched::Scheduler$new(cache_dir = NULL) # Define a scheduling rule of 7 requests every 2 seconds scheduler$setRule('www.ebi.ac.uk', n=7, lap=2) # Create a request object u <- 'https://www.ebi.ac.uk/webservices/chebi/2.0/test/getCompleteEntity' url <- sched::URL$new(url=u, params=c(chebiId=15440)) request <- sched::Request$new(url) # Send the request and get the content result content <- scheduler$sendRequest(request)
downloadFile()
Downloads the content of a URL and save it into the specified destination file.
This method works for any URL, even if it has been written with heavy
files in mind.
Since it uses utils::download.file()
which saves the content
directly on disk, the cache system is not used.
Scheduler$downloadFile(url, dest_file, quiet = FALSE, timeout = NULL)
url
The URL to access, as a sched::URL object.
dest_file
A path to a destination file.
quiet
The quiet parameter for utils::download.file()
.
timeout
The timeout in seconds. Defaults to value provided in initializer.
Nothing.
# Create a scheduler instance scheduler <- sched::Scheduler$new(cache_dir = NULL) # Create a temporary directory tmp_dir <- tempdir() # Download a file u <- sched::URL$new( 'https://gitlab.com/cnrgh/databases/r-sched/-/raw/main/README.md', c(ref_type='heads')) scheduler$downloadFile(u, file.path(tmp_dir, 'README.md')) # Remove the temporary directory unlink(tmp_dir, recursive = TRUE)
getUrlString()
Builds a URL string, using a base URL and parameters to be passed.
The provided base URL and parameters are combined into a full URL string.
DEPRECATED. Use the sched::URL
class and its method
toString()
instead.
Scheduler$getUrlString(url, params = list())
url
A URL string.
params
A list of URL parameters.
The full URL string as a single character value.
# Create a scheduler instance scheduler <- sched::Scheduler$new(cache_dir = NULL) # Create a URL string url.str <- scheduler$getUrlString( 'https://www.ebi.ac.uk/webservices/chebi/2.0/test/getCompleteEntity', params=c(chebiId=15440))
getUrl()
Sends a request and get the result.
DEPRECATED. Use method sendRequest()
instead.
Scheduler$getUrl( url, params = list(), method = c("get", "post"), header = NULL, body = NULL, encoding = NULL )
url
A URL string.
params
A list of URL parameters.
method
The method to use. Either 'get' or 'post'.
header
The header to send.
body
The body to send.
encoding
The encoding to use.
The results of the request.
# Create a scheduler instance scheduler <- sched::Scheduler$new(cache_dir = NULL) # Send request content <- scheduler$getUrl( 'https://www.ebi.ac.uk/webservices/chebi/2.0/test/getCompleteEntity', params=c(chebiId=15440))
deleteRules()
Removes all defined rules, including the ones automatically defined using default_rule.
Scheduler$deleteRules()
Nothing.
# Create a scheduler instance scheduler <- sched::Scheduler$new(cache_dir = NULL) # Define a rule with custome values scheduler$setRule('my.other.site', n=10, lap=3) # Delete all defined rules scheduler$deleteRules()
getNbRules()
Gets the number of defined rules, including the ones automatically defined using default_rule.
Scheduler$getNbRules()
The number of rules defined.
# Create a scheduler instance scheduler <- sched::Scheduler$new(cache_dir = NULL) # Get the number of defined rules print(scheduler$getNbRules())
setOffline()
Enables or disables offline mode.
If the offline mode is enabled, an error will be raised when the class attemps to send a request. This mode is mainly useful when debugging the usage of the cache system.
Scheduler$setOffline(offline)
offline
Set to TRUE to enable offline mode, and FALSE otherwise.
Nothing.
# Create a scheduler instance scheduler <- sched::Scheduler$new(cache_dir = NULL) # Enable offline mode scheduler$setOffline(TRUE)
isOffline()
Tests if offline mode is enabled.
Scheduler$isOffline()
TRUE is offline mode is enabled, FALSE otherwise.
# Create a scheduler instance scheduler <- sched::Scheduler$new(cache_dir = NULL) # Test if offline mode is enabled if (scheduler$isOffline()) print("Scheduler is offline.")
clone()
The objects of this class are cloneable with this method.
Scheduler$clone(deep = FALSE)
deep
Whether to make a deep clone.
# Create a scheduler instance without cache
scheduler <- sched::Scheduler$new(cache_dir = NULL)
# Define a rule with default values
scheduler$setRule('www.ebi.ac.uk')
# Create a request object
u <- 'https://www.ebi.ac.uk/webservices/chebi/2.0/test/getCompleteEntity'
url <- sched::URL$new(url=u, params=c(chebiId=15440))
request <- sched::Request$new(url)
# Send the request and get the content result
content <- scheduler$sendRequest(request)
## ------------------------------------------------
## Method `Scheduler$new`
## ------------------------------------------------
# Create a scheduler instance with a custom default_rule
scheduler <- sched::Scheduler$new(default_rule=sched::Rule$new(10, 1),
cache_dir = NULL)
## ------------------------------------------------
## Method `Scheduler$setRule`
## ------------------------------------------------
# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)
# Define a rule with default values
scheduler$setRule('www.ebi.ac.uk')
# Define a rule with custome values
scheduler$setRule('my.other.site', n=10, lap=3)
## ------------------------------------------------
## Method `Scheduler$sendRequest`
## ------------------------------------------------
# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)
# Define a scheduling rule of 7 requests every 2 seconds
scheduler$setRule('www.ebi.ac.uk', n=7, lap=2)
# Create a request object
u <- 'https://www.ebi.ac.uk/webservices/chebi/2.0/test/getCompleteEntity'
url <- sched::URL$new(url=u, params=c(chebiId=15440))
request <- sched::Request$new(url)
# Send the request and get the content result
content <- scheduler$sendRequest(request)
## ------------------------------------------------
## Method `Scheduler$downloadFile`
## ------------------------------------------------
# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)
# Create a temporary directory
tmp_dir <- tempdir()
# Download a file
u <- sched::URL$new(
'https://gitlab.com/cnrgh/databases/r-sched/-/raw/main/README.md',
c(ref_type='heads'))
scheduler$downloadFile(u, file.path(tmp_dir, 'README.md'))
# Remove the temporary directory
unlink(tmp_dir, recursive = TRUE)
## ------------------------------------------------
## Method `Scheduler$getUrlString`
## ------------------------------------------------
# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)
# Create a URL string
url.str <- scheduler$getUrlString(
'https://www.ebi.ac.uk/webservices/chebi/2.0/test/getCompleteEntity',
params=c(chebiId=15440))
## ------------------------------------------------
## Method `Scheduler$getUrl`
## ------------------------------------------------
# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)
# Send request
content <- scheduler$getUrl(
'https://www.ebi.ac.uk/webservices/chebi/2.0/test/getCompleteEntity',
params=c(chebiId=15440))
## ------------------------------------------------
## Method `Scheduler$deleteRules`
## ------------------------------------------------
# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)
# Define a rule with custome values
scheduler$setRule('my.other.site', n=10, lap=3)
# Delete all defined rules
scheduler$deleteRules()
## ------------------------------------------------
## Method `Scheduler$getNbRules`
## ------------------------------------------------
# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)
# Get the number of defined rules
print(scheduler$getNbRules())
## ------------------------------------------------
## Method `Scheduler$setOffline`
## ------------------------------------------------
# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)
# Enable offline mode
scheduler$setOffline(TRUE)
## ------------------------------------------------
## Method `Scheduler$isOffline`
## ------------------------------------------------
# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)
# Test if offline mode is enabled
if (scheduler$isOffline())
print("Scheduler is offline.")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.