cas_get_sitemap: Checks for availability of a sitemap in xml format.

View source: R/cas_get_sitemap.R

cas_get_sitemapR Documentation

Checks for availability of a sitemap in xml format.

Description

Searches in common locations (namely, example.com/sitemap.xml, and example.com/sitemap_index.xml) and then in robots.txt and returns a url to the sitemap, along with the contents of the sitemap itself, if found.

Usage

cas_get_sitemap(
  domain = NULL,
  sitemap_url = NULL,
  check_robots = TRUE,
  check_common = TRUE,
  read_from_db = TRUE,
  write_to_db = FALSE,
  db_connection = NULL,
  disconnect_db = FALSE,
  ...
)

Arguments

domain

Defaults to NULL, but required unless sitemap_url given. Expected to be a full domain name. If input does not start with http, then ⁠https://⁠ is prepended automatically.

sitemap_url

Defaults to NULL. If given, domain is ignored.

db_connection

Defaults to NULL. If NULL, uses local SQLite database. If given, must be a connection object or a list with relevant connection settings (see example).

...

Passed to cas_get_db_file().

Value

A data frame, including a sitemap_url column, the response as an httr2 object, and the body of the xml.

Examples

if (interactive()) {
  cas_get_sitemap(domain = "https://www.europeandatajournalism.eu/")
}

giocomai/castarter documentation built on June 12, 2025, 8:49 p.m.