cas_download_chromote: Downloads one file at a time with 'chromote'

View source: R/cas_download_chromote.R

cas_download_chromoteR Documentation

Downloads one file at a time with chromote

Description

Downloads one file at a time with chromote

Usage

cas_download_chromote(
  download_df = NULL,
  index = FALSE,
  index_group = NULL,
  overwrite_file = FALSE,
  ignore_id = TRUE,
  wait = 1,
  delay = 0,
  timeout = 20,
  db_connection = NULL,
  sample = FALSE,
  file_format = "html",
  download_again = FALSE,
  download_again_if_status_is_not = NULL,
  disconnect_db = FALSE,
  ...
)

Arguments

download_df

A data frame with four columns: id, url, path, type.

index

Logical, defaults to FALSE. If TRUE, downloaded files will be considered index files. If not, they will be considered contents files. See Readme for a more extensive explanation.

overwrite_file

Logical, defaults to FALSE.

wait

Defaults to 1. Number of seconds to wait between downloading one page and the next. Can be increased to reduce server load, or can be set to 0 when this is not an issue.

delay

Defaults to 0. Passed to chromote's internal method go_to. Number of seconds to wait after the page load event fires.

timeout

Defaults to 20. Passed to chromote's internal method go_to. Maximum time in seconds to wait for the page load event.

db_connection

Defaults to NULL. If NULL, uses local SQLite database. If given, must be a connection object or a list with relevant connection settings (see example).

sample

Defaults to FALSE. If TRUE, the download order is randomised. If a numeric is given, the download order is randomised and at most the given number of items is downloaded.

file_format

Defaults to html. Used for storing files in dedicated folders, but also for determining processing options. For example, if a sitemap is downloaded as an index with file_format set to xml, it will be processed accordingly. If it is stored as xml.gz, it will be automatically decompressed for correct processing.

download_again_if_status_is_not

Defaults to NULL. If given, it must a status code as integer, typically 200L, or c(200L, 404L).

disconnect_db

Defaults to TRUE. If FALSE, leaves the connection to database open.

...

Passed to cas_get_db_file().


giocomai/castarter documentation built on June 12, 2025, 8:49 p.m.