cas_get_files_to_extract: Get path to (locally available) files to be extracted

View source: R/cas_get_files_to_extract.R

cas_get_files_to_extractR Documentation

Get path to (locally available) files to be extracted

Description

Mostly used internally by cas_extract or for troubleshooting.

Usage

cas_get_files_to_extract(
  id = NULL,
  ignore_id = TRUE,
  custom_path = NULL,
  index = FALSE,
  store_as_character = TRUE,
  check_previous = TRUE,
  db_connection = NULL,
  file_format = "html",
  sample = FALSE,
  keep_if_status = 200,
  ...
)

Arguments

id

Defaults to NULL, identifiers to process when extracting. If given, must be a numeric vector, logically corresponding to the identifiers in the id column, e.g. as returned by cas_read_db_contents_id()

ignore_id

Defaults to TRUE. If TRUE, it checks if identifiers have been added to the local ignore list, typically with cas_ignore_id(), and as retrieved with cas_read_db_ignore_id(). It can also be a numeric vector of identifiers: the given identifiers will not be processed. If FALSE, items will be processed normally.

index

Logical, defaults to FALSE. If TRUE, downloaded files will be considered index files. If not, they will be considered contents files. See Readme for a more extensive explanation.

store_as_character

Logical, defaults to TRUE. If TRUE, it converts to character all extracted contents before writing them to database. This reduces issues of type conversions with the default database backend (for example, SQLite automatically converts dates to numeric) or using different backends. This implies you will need to set data types when you read the database, but it also means that you can consistently expect all columns to be character vectors, which in one form or another are consistently implemented across database backends. Set to FALSE if you want to remain in control of column types.

check_previous

Logical, defaults to TRUE. If FALSE, no check will be conducted to verify if the same content had been previously extracted. If FALSE, write_to_db must be set (or will be set) to FALSE, to prevent duplication of data.

sample

Defaults to FALSE. If TRUE, the download order is randomised. If a numeric is given, the download order is randomised and at most the given number of items is downloaded.

keep_if_status

Defaults to 200. Keep only if recorded download status matches the given status.

...

Passed to cas_get_db_file().

Examples

#'
## Not run: 
if (interactive) {
  cas_get_files_to_extract()
}

## End(Not run)

giocomai/castarter documentation built on Oct. 17, 2024, 7:25 a.m.