cas_update | R Documentation |
Currently supports only update when re-downloading index urls is expected to bring new articles. It takes the first urls for each index group, and continues downloading new index pages as long as new links are found in each page. If no new link is found, it stops downloading and moves to the next index group.
cas_update(
extract_links_partial,
extractors,
post_processing = NULL,
wait = 3,
user_agent = NULL,
...
)
extract_links_partial |
A partial function, typically created with
|
extractors |
A named list of functions. See examples for details. |
post_processing |
Defaults to NULL. If given, it must be a function that takes a data frame as input (logically, a row of the dataset) and returns it with additional or modified columns. |
wait |
Defaults to 1. Number of seconds to wait between downloading one page and the next. Can be increased to reduce server load, or can be set to 0 when this is not an issue. |
user_agent |
Defaults to NULL. If given, passed to download method. |
... |
Passed to |
# Example of extract_links_partial:
extract_links_partial <- purrr::partial(
.f = cas_extract_links,
reverse_order = TRUE,
container = "div",
container_class = "hentry h-entry hentry_event",
exclude_when = c("/photos", "/videos"),
domain = "http://en.kremlin.ru/"
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.