View source: R/cas_build_urls.R
cas_build_urls | R Documentation |
Convenience function typically used to generate urls to index pages listing articles.
cas_build_urls(
url,
url_ending = "",
glue = FALSE,
start_page = NULL,
end_page = NULL,
increase_by = 1,
date_format = "Ymd",
start_date = NULL,
end_date = Sys.Date() - 1,
date_separator = NULL,
increase_date_by = "day",
reversed_order = FALSE,
index_group = "index",
index = TRUE,
write_to_db = FALSE,
...
)
url |
First part of index link that does not change in other index pages. |
url_ending |
Part of index link appneded after the part of the link that varies. If not relevant, may be left empty. |
glue |
Logical, defaults to FALSE. If TRUE, the url is parsed with
|
start_page |
If the urls include a numerical component, define first number of the sequence. Defaults to NULL. If given, coerced to numeric, expected to be an integer. |
end_page |
If the urls include a numerical component, define first number of the sequence. Defaults to NULL. If given, coerced to numeric, expected to be an integer. |
increase_by |
Defines by how much the number in the link should be increased in the numerical sequence. Defaults to 1. |
date_format |
A character string, defaults to "YMD". Check
|
start_date |
Defaults to NULL. If given, a date, or a character vector
of length one coercible to date with |
end_date |
Defaults to |
increase_date_by |
Defaults to "day". See |
reversed_order |
Logical, defaults to FALSE. If TRUE, the order of urls in the output. |
index_group |
A character vector, defaults to "index". Used for differentiating among different types of index or links in local databases. |
index |
Defaults to TRUE. Relevant only if |
write_to_db |
Defaults to FALSE. If set to TRUE, stores the newly created URLs to the local database. |
A data frame with three columns, id
, url
, and index_group
.
Typically, url
corresponds to a vector of unique urls.
It is not uncommon in particular for index pages to
include dates in the URL, along the lines of
example.com/archive/2022-01-01
, example.com/archive/2022-01-02
, etc. To
build such urls, cas_build_urls
needs a start_date
and end_date
.
The formatting of the date can be defined either by providing to the
parameter date_format
a string that strptime
is able to
interpret directly, or a simplified string (such as "Ymd", without the
"%"),adding a date_separator
such as "-" as needed.
cas_build_urls(
url = "https://www.example.com/news/",
start_page = 1,
end_page = 10
)
cas_build_urls(
url = "https://example.com/news/?skip=",
start_page = 0,
end_page = 100,
increase_by = 10
)
cas_build_urls(
url = "https://example.com/archive/",
start_date = "2022-01-01",
end_date = "2022-12-31",
date_separator = "-"
) %>%
head()
cas_build_urls(
url = "https://example.com/archive/?from={here}&to={here}",
glue = TRUE,
start_date = "2011-01-01",
end_date = "2022-12-31",
date_separator = ".",
date_format = "dmY",
index_group = "news"
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.