generateR | R Documentation |
Queries the crawlDB for urls matching the given parameters.
generateR(
out_dir = NULL,
work_dir = NULL,
regExOut = NULL,
regExIn = NULL,
max_depth = NULL,
topN = NULL,
external_site = F,
max_urls_per_host = 10,
crawl_delay = NULL,
log_file = NULL,
seeds_only = F,
min_score = 0
)
out_dir |
(Required) Output directory for this crawl. |
work_dir |
(Required) Working directory for this crawl. |
regExOut |
RegEx URL filter - omit links with these keywords. |
regExIn |
RegEx URL filter - keep links with these keywords. |
max_depth |
maximum depth for selected url's |
topN |
Choose these top links. |
external_site |
Logical. If False, host outside the seed list will NOT be crawled. |
max_urls_per_host |
Max number of URL's to generate per host. |
crawl_delay |
crawl delay for requests to the same host |
log_file |
Name of log file. If null, writes to stdout(). |
seeds_only |
gen only seeds |
min_score |
minimum score for url |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.