bb_fingerprint: Fingerprint the files associated with a data source

View source: R/provenance.R

bb_fingerprintR Documentation

Fingerprint the files associated with a data source

Description

The bb_fingerprint function, given a data repository configuration, will return the timestamp of download and hashes of all files associated with its data sources. This is intended as a general helper for tracking data provenance: for all of these files, we have information on where they came from (the data source ID), when they were downloaded, and a hash so that later versions of those files can be compared to detect changes. See also vignette("data_provenance").

Usage

bb_fingerprint(config, hash = "sha1")

Arguments

config

bb_config: configuration as returned by bb_config

hash

string: algorithm to use to calculate file hashes: "md5", "sha1", or "none". Note that file hashing can be slow for large file collections

Value

a tibble with columns:

  • filename - the full path and filename of the file

  • data_source_id - the identifier of the associated data source (as per the id argument to bb_source)

  • size - the file size

  • last_modified - last modified date of the file

  • hash - the hash of the file (unless hash="none" was specified)

See Also

vignette("data_provenance")

Examples

## Not run: 
  cf <- bb_config("/my/file/root") %>%
    bb_add(bb_example_sources())
  bb_fingerprint(cf)

## End(Not run)


AustralianAntarcticDivision/bowerbird documentation built on March 8, 2024, 8:33 a.m.