bb_rget: A recursive download utility
In AustralianAntarcticDivision/bowerbird: Keep a Collection of Sparkly Data Resources

bb_rget

R Documentation

A recursive download utility

Description

This function provides similar functionality to the the command-line wget utility.

Usage

bb_rget(
  url,
  level = 0,
  wait = 0,
  accept_follow = c("(/|\\.html?)$"),
  reject_follow = character(),
  accept_download = bb_rget_default_downloads(),
  accept_download_extra = character(),
  reject_download = character(),
  user,
  password,
  clobber = 1,
  no_parent = TRUE,
  no_parent_download = no_parent,
  no_check_certificate = FALSE,
  relative = FALSE,
  remote_time = TRUE,
  verbose = FALSE,
  show_progress = verbose,
  debug = FALSE,
  dry_run = FALSE,
  stop_on_download_error = FALSE,
  retries = 0,
  force_local_filename,
  use_url_directory = TRUE,
  no_host = FALSE,
  cut_dirs = 0L,
  link_css = "a",
  link_href = "href",
  curl_opts,
  target_s3_args,
  download_link_rewrite
)

bb_rget_default_downloads()

Arguments

`url`	string: the URL to retrieve
`level`	integer >=0: recursively download to this maximum depth level. Specify 0 for no recursion
`wait`	numeric >=0: wait this number of seconds between successive retrievals. This option may help with servers that block users making too many requests in a short period of time
`accept_follow`	character: character vector with one or more entries. Each entry specifies a regular expression that is applied to the complete URL. URLs matching all entries will be followed during the spidering process. Note that the first URL (provided via the `url` parameter) will always be visited, unless it matches the download criteria
`reject_follow`	character: as for `accept_follow`, but specifying URL regular expressions to reject
`accept_download`	character: character vector with one or more entries. Each entry specifies a regular expression that is applied to the complete URL. URLs that match all entries will be accepted for download. By default the `accept_download` parameter is that returned by `bb_rget_default_downloads`: use `bb_rget_default_downloads()` to see what that is
`accept_download_extra`	character: character vector with one or more entries. If provided, URLs will be accepted for download if they match all entries in `accept_download` OR all entries in `accept_download_extra`. This is a convenient method to add one or more extra download types, without needing to re-specify the defaults in `accept_download`
`reject_download`	character: as for `accept_regex`, but specifying URL regular expressions to reject
`user`	string: username used to authenticate to the remote server
`password`	string: password used to authenticate to the remote server
`clobber`	numeric: 0=do not overwrite existing files, 1=overwrite if the remote file is newer than the local copy, 2=always overwrite existing files
`no_parent`	logical: if `TRUE`, do not ever ascend to the parent directory when retrieving recursively. This is `TRUE` by default, bacause it guarantees that only the files below a certain hierarchy will be downloaded. Note that this check only applies to links on the same host as the starting `url`. If that URL links to files on another host, those links will be followed (unless `relative = TRUE`)
`no_parent_download`	logical: similar to `no_parent`, but applies only to download links. A typical use case is to set `no_parent` to `TRUE` and `no_parent_download` to `FALSE`, in which case the spidering process (following links to find downloadable files) will not ascend to the parent directory, but files can be downloaded from a directory that is not within the parent
`no_check_certificate`	logical: if `TRUE`, don't check the server certificate against the available certificate authorities. Also don't require the URL host name to match the common name presented by the certificate. This option might be useful if trying to download files from a server with an expired certificate, but it is clearly a security risk and so should be used with caution
`relative`	logical: if `TRUE`, only follow relative links. This can be useful for restricting what is downloaded in recursive mode
`remote_time`	logical: if `TRUE`, attempt to set the local file's time to that of the remote file
`verbose`	logical: print trace output?
`show_progress`	logical: if `TRUE`, show download progress
`debug`	logical: if `TRUE`, will print additional debugging information. If bb_rget is not behaving as expected, try setting this to `TRUE`
`dry_run`	logical: if `TRUE`, spider the remote site and work out which files would be downloaded, but don't download them
`stop_on_download_error`	logical: if `TRUE`, the download process will stop if any file download fails. If `FALSE`, the process will issue a warning and continue to the next file to download
`retries`	integer: number of times to retry a request if it fails with a transient error (similar to curl, a transient error means a timeout, an FTP 4xx response code, or an HTTP 5xx response code
`force_local_filename`	character: if provided, then each `url` will be treated as a single URL (no recursion will be conducted). It will be downloaded to a file with name given `force_local_filename`, in a local directory determined by the `url`. `force_local_filename` should be a character vector of the same length as the `url` vector
`use_url_directory`	logical: if `TRUE`, files will be saved into a local directory that follows the URL structure (e.g. files from `http://some.where/place` will be saved into directory `some.where/place`). If `FALSE`, files will be saved into the current directory
`no_host`	logical: if `use_url_directory = TRUE`, specifying `no_host = TRUE` will remove the host name from the directory (e.g. files from files from `http://some.where/place` will be saved into directory `place`)
`cut_dirs`	integer: if `use_url_directory = TRUE`, specifying `cut_dirs` will remove this many directory levels from the path of the local directory where files will be saved (e.g. if `cut_dirs = 2`, files from `http://some.where/place/baa/haa` will be saved into directory `some.where/haa`. if `cut_dirs = 1` and `no_host = TRUE`, files from `http://some.where/place/baa/haa` will be saved into directory `baa/haa`)
`link_css`	string: css selector that identifies links (passed as the `css` parameter to `html_elements`). Note that link elements must have an `link_href` attribute
`link_href`	string: the attribute of a link that gives the destination (i.e. the URL to follow)
`curl_opts`	named list: options to use with `curl` downloads, passed to the `.list` parameter of `curl::new_handle`
`target_s3_args`	list: named list or arguments to provide to `get_bucket_df` and `put_object`. Files will be uploaded into that bucket instead of the local filesystem
`download_link_rewrite`	function: if supplied, this function will be applied to each download link after it is scraped from the source page and expanded to an absolute URL but before it is checked against `accept_download`. This function should take three parameters: `x` is a character vector of download link URLs `url` is the starting URL (from which those download URLs were scraped) `content` is the content of the starting URL, as an XML document as returned by [xml2::read_html()] and it should return a copy of `x`, with entries appropriately modified.

Value

a list with components 'ok' (TRUE/FALSE), 'files', and 'message' (error or other messages)

AustralianAntarcticDivision/bowerbird documentation built on June 11, 2025, 3:14 p.m.

AustralianAntarcticDivision/bowerbird index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

AustralianAntarcticDivision/bowerbird
Keep a Collection of Sparkly Data Resources

bb_rget: A recursive download utility
In AustralianAntarcticDivision/bowerbird: Keep a Collection of Sparkly Data Resources

A recursive download utility

Description

Usage

Arguments

Value

Related to bb_rget in AustralianAntarcticDivision/bowerbird...

R Package Documentation

Browse R Packages

We want your feedback!

AustralianAntarcticDivision/bowerbird Keep a Collection of Sparkly Data Resources

bb_rget: A recursive download utility In AustralianAntarcticDivision/bowerbird: Keep a Collection of Sparkly Data Resources

A recursive download utility

Description

Usage

Arguments

Value

Related to bb_rget in AustralianAntarcticDivision/bowerbird...

R Package Documentation

Browse R Packages

We want your feedback!

AustralianAntarcticDivision/bowerbird
Keep a Collection of Sparkly Data Resources

bb_rget: A recursive download utility
In AustralianAntarcticDivision/bowerbird: Keep a Collection of Sparkly Data Resources