Description Usage Arguments Details Value Examples
Launch a crawl
1 2 | sparkler.crawl(vm, url, topUrls, topGroups, maxIter, debug = FALSE,
mode = "default")
|
vm |
The Instance object |
url |
URL website to crawl |
topUrls |
Number of URLs in each website |
topGroups |
Number of hosts to fetch in parallel. |
maxIter |
Number of iterations to run. |
debug |
If TRUE, will see debug messages. |
mode |
Choose your delays (default:1000ms,fast:500ms,turbo:100ms) between two fetch requests for the same host |
Check if Docker exists and running - If not, we create the docker with Sparkler with the "docker run" command - If exists, we restart it Next, we use "sparkler crawl" to inject URL parameters in Sparkler and launch a crawl.
Very important: Sparkler developers slow down the crawl to avoid getting blocked from the websites. Top groups = number of hosts to fetch in parallel. Top N = number of URLs in each website. By default, it tries for 256 groups and 1000 URLs in each group
The crawl Id
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | ## Not run:
library(RsparkleR)
ovh <- import_ovh()
client <- load_client(ovh, endpoint, application_key, application_secret, consumer_key)
sshPubKeyPath <- 'C:/Users/vterrasi/.ssh/id_rsa.pub'
sshPrivKeyPath <- 'C:/Users/vterrasi/.ssh/id_rsa'
vm <- sparkler.create(client, regionVM="UK1", typeVM="s1-4", sshPubKeyPath, sshPrivKeyPath)
sparkler.start(vm, debug)
url <- "https://www.YOUR WEBSITE.com"
pattern <- "www.YOUR WEBSITE.com"
topN <- 1000
maxIter <- 100;
topGroups <- 2
crawlid <- sparkler.crawl(vm, url, topN, topGroups, maxIter, debug=FALSE, mode="fast")
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.