bb_wget: Make a wget call
In AustralianAntarcticDivision/bowerbird: Keep a Collection of Sparkly Data Resources

bb_wget

R Documentation

Make a wget call

Description

This function is an R wrapper to the command-line wget utility, which is called using either the exec_wait or the exec_internal function from the sys package. Almost all of the parameters to bb_wget are translated into command-line flags to wget. Call bb_wget("help") to get more information about wget's command line flags. If required, command-line flags without equivalent bb_wget function parameters can be passed via the extra_flags parameter.

Usage

bb_wget(
  url,
  recursive = TRUE,
  level = 1,
  wait = 0,
  accept,
  reject,
  accept_regex,
  reject_regex,
  exclude_directories,
  restrict_file_names,
  progress,
  user,
  password,
  output_file,
  robots_off = FALSE,
  timestamping = FALSE,
  no_if_modified_since = FALSE,
  no_clobber = FALSE,
  no_parent = TRUE,
  no_check_certificate = FALSE,
  relative = FALSE,
  adjust_extension = FALSE,
  retr_symlinks = FALSE,
  extra_flags = character(),
  verbose = FALSE,
  capture_stdout = FALSE,
  quiet = FALSE,
  debug = FALSE
)

Arguments

`url`	string: the URL to retrieve
`recursive`	logical: if true, turn on recursive retrieving
`level`	integer >=0: recursively download to this maximum depth level. Only applicable if `recursive=TRUE`. Specify 0 for infinite recursion. See https://www.gnu.org/software/wget/manual/wget.html#Recursive-Download for more information about wget's recursive downloading
`wait`	numeric >=0: wait this number of seconds between successive retrievals. This option may help with servers that block multiple successive requests, by introducing a delay between requests
`accept`	character: character vector with one or more entries. Each entry specifies a comma-separated list of filename suffixes or patterns to accept. Note that if any of the wildcard characters '', '?', '[', or ']' appear in an element of accept, it will be treated as a filename pattern, rather than a filename suffix. In this case, you have to enclose the pattern in quotes, for example `accept="\".csv\""`
`reject`	character: as for `accept`, but specifying filename suffixes or patterns to reject
`accept_regex`	character: character vector with one or more entries. Each entry provides a regular expression that is applied to the complete URL. Matching URLs will be accepted for download
`reject_regex`	character: as for `accept_regex`, but specifying regular expressions to reject
`exclude_directories`	character: character vector with one or more entries. Each entry specifies a comma-separated list of directories you wish to exclude from download. Elements may contain wildcards
`restrict_file_names`	character: vector of one of more strings from the set "unix", "windows", "nocontrol", "ascii", "lowercase", and "uppercase". See https://www.gnu.org/software/wget/manual/wget.html#index-Windows-file-names for more information on this parameter. `bb_config` sets this to "windows" by default: if you are downloading files from a server with a port (http://somewhere.org:1234/) Unix will allow the ":" as part of directory/file names, but Windows will not (the ":" will be replaced by "+"). Specifying `restrict_file_names="windows"` causes Windows-style file naming to be used
`progress`	string: the type of progress indicator you wish to use. Legal indicators are "dot" and "bar". "dot" prints progress with dots, with each dot representing a fixed amount of downloaded data. The style can be adjusted: "dot:mega" will show 64K per dot and 3M per line; "dot:giga" shows 1M per dot and 32M per line. See https://www.gnu.org/software/wget/manual/wget.html#index-dot-style for more information
`user`	string: username used to authenticate to the remote server
`password`	string: password used to authenticate to the remote server
`output_file`	string: save wget's output messages to this file
`robots_off`	logical: by default wget considers itself to be a robot, and therefore won't recurse into areas of a site that are excluded to robots. This can cause problems with servers that exclude robots (accidentally or deliberately) from parts of their sites containing data that we want to retrieve. Setting `robots_off=TRUE` will add a "-e robots=off" flag, which instructs wget to behave as a human user, not a robot. See https://www.gnu.org/software/wget/manual/wget.html#Robot-Exclusion for more information about robot exclusion
`timestamping`	logical: if `TRUE`, don't re-retrieve a remote file unless it is newer than the local copy (or there is no local copy)
`no_if_modified_since`	logical: applies when retrieving recursively with timestamping (i.e. only downloading files that have changed since last download, which is achieved using `bb_config(...,clobber=1)`). The default method for timestamping is to issue an "If-Modified-Since" header on the request, which instructs the remote server not to return the file if it has not changed since the specified date. Some servers do not support this header. In these cases, trying using `no_if_modified_since=TRUE`, which will instead send a preliminary HEAD request to ascertain the date of the remote file
`no_clobber`	logical: if `TRUE`, skip downloads that would overwrite existing local files
`no_parent`	logical: if `TRUE`, do not ever ascend to the parent directory when retrieving recursively. This is `TRUE` by default, bacause it guarantees that only the files below a certain hierarchy will be downloaded
`no_check_certificate`	logical: if `TRUE`, don't check the server certificate against the available certificate authorities. Also don't require the URL host name to match the common name presented by the certificate. This option might be useful if trying to download files from a server with an expired certificate, but it is clearly a security risk and so should be used with caution
`relative`	logical: if `TRUE`, only follow relative links. This can sometimes be useful for restricting what is downloaded in recursive mode
`adjust_extension`	logical: if a file of type 'application/xhtml+xml' or 'text/html' is downloaded and the URL does not end with .htm or .html, this option will cause the suffix '.html' to be appended to the local filename. This can be useful when mirroring a remote site that has file URLs that conflict with directories (e.g. http://somewhere.org/this/page which has further content below it, say at http://somewhere.org/this/page/more. If "somewhere.org/this/page" is saved as a file with that name, that name can't also be used as the local directory name in which to store the lower-level content. Setting `adjust_extension=TRUE` will cause the page to be saved as "somewhere.org/this/page.html", thus resolving the conflict
`retr_symlinks`	logical: if `TRUE`, follow symbolic links during recursive download. Note that this will only follow symlinks to files, NOT to directories
`extra_flags`	character: character vector of additional command-line flags to pass to wget
`verbose`	logical: print trace output?
`capture_stdout`	logical: if `TRUE`, return 'stdout' and 'stderr' output in the returned object (see exec_internal from the sys package). Otherwise send these outputs to the console
`quiet`	logical: if `TRUE`, suppress wget's output
`debug`	logical: if `TRUE`, wget will print lots of debugging information. If wget is not behaving as expected, try setting this to `TRUE`

Value

the result of the system call (or if bb_wget("--help") was called, a message will be issued). The returned object will have components 'status' and (if capture_stdout was TRUE) 'stdout' and 'stderr'

Examples

## Not run: 
  ## get help about wget command line parameters
  bb_wget("help")

## End(Not run)

AustralianAntarcticDivision/bowerbird documentation built on March 8, 2024, 8:33 a.m.

AustralianAntarcticDivision/bowerbird index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

AustralianAntarcticDivision/bowerbird
Keep a Collection of Sparkly Data Resources

bb_wget: Make a wget call
In AustralianAntarcticDivision/bowerbird: Keep a Collection of Sparkly Data Resources

Make a wget call

Description

Usage

Arguments

Value

See Also

Examples

Related to bb_wget in AustralianAntarcticDivision/bowerbird...

R Package Documentation

Browse R Packages

We want your feedback!

AustralianAntarcticDivision/bowerbird Keep a Collection of Sparkly Data Resources

bb_wget: Make a wget call In AustralianAntarcticDivision/bowerbird: Keep a Collection of Sparkly Data Resources

Make a wget call

Description

Usage

Arguments

Value

See Also

Examples

Related to bb_wget in AustralianAntarcticDivision/bowerbird...

R Package Documentation

Browse R Packages

We want your feedback!

AustralianAntarcticDivision/bowerbird
Keep a Collection of Sparkly Data Resources

bb_wget: Make a wget call
In AustralianAntarcticDivision/bowerbird: Keep a Collection of Sparkly Data Resources