bb_sync: Run a bowerbird data repository synchronization

View source: R/sync.R

bb_syncR Documentation

Run a bowerbird data repository synchronization

Description

This function takes a bowerbird configuration object and synchronizes each of the data sources defined within it. Data files will be downloaded if they are not present on the local machine, or if the configuration has been set to update local files.

Usage

bb_sync(
  config,
  create_root = FALSE,
  verbose = FALSE,
  catch_errors = TRUE,
  confirm_downloads_larger_than = 0.1,
  dry_run = FALSE
)

Arguments

config

bb_config: configuration as returned by bb_config

create_root

logical: should the data root directory be created if it does not exist? If this is FALSE (default) and the data root directory does not exist, an error will be generated

verbose

logical: if TRUE, provide additional progress output

catch_errors

logical: if TRUE, catch errors and continue the synchronization process. The sync process works through data sources sequentially, and so if catch_errors is FALSE, then an error during the synchronization of one data source will prevent all subsequent data sources from synchronizing

confirm_downloads_larger_than

numeric or NULL: if non-negative, bb_sync will ask the user for confirmation to download any data source of size greater than this number (in GB). A value of zero will trigger confirmation on every data source. A negative or NULL value will not prompt for confirmation. Note that this only applies when R is being used interactively. The expected download size is taken from the collection_size parameter of the data source, and so its accuracy is dependent on the accuracy of the data source definition

dry_run

logical: if TRUE, bb_sync will do a dry run of the synchronization process without actually downloading files

Details

Note that when bb_sync is run, the local_file_root directory must exist or create_root=TRUE must be specified (i.e. bb_sync(...,create_root=TRUE)). If create_root=FALSE and the directory does not exist, bb_sync will fail with an error.

Value

a tibble with the name, id, source_url, sync success status, and files of each data source. Data sources that contain multiple source URLs will appear as multiple rows in the returned tibble, one per source_url. files is a tibble with columns url (the URL the file was downloaded from), file (the path to the file), and note (either "downloaded" for a file that was downloaded, "local copy" for a file that was not downloaded because there was already a local copy, or "decompressed" for files that were extracted from a downloaded (or already-locally-present) compressed file. url will be NA for "decompressed" files

See Also

bb_config, bb_source

Examples

## Not run: 
  ## Choose a location to store files on the local file system.
  ## Normally this would be an explicit choice by the user, but here
  ## we just use a temporary directory for example purposes.

  td <- tempdir()
  cf <- bb_config(local_file_root = td)

  ## Bowerbird must then be told which data sources to synchronize.
  ## Let's use data from the Australian 2016 federal election, which is provided as one
  ## of the example data sources:

  my_source <- bb_example_sources("Australian Election 2016 House of Representatives data")

  ## Add this data source to the configuration:

  cf <- bb_add(cf, my_source)

  ## Once the configuration has been defined and the data source added to it,
  ## we can run the sync process.
  ## We set \code{verbose=TRUE} so that we see additional progress output:

  status <- bb_sync(cf, verbose = TRUE)

  ## The files in this data set have been stored in a data-source specific
  ## subdirectory of our local file root:

  status$files[[1]]

  ## We can run this at any later time and our repository will update if the source has changed:

  status2 <- bb_sync(cf, verbose = TRUE)

## End(Not run)


ropensci/bowerbird documentation built on March 10, 2024, 8:10 a.m.