bb_source | R Documentation |
This function is used to define a data source, which can then be added to a bowerbird data repository configuration. Passing the configuration object to bb_sync
will trigger a download of all of the data sources in that configuration.
bb_source(
id,
name,
description = NA_character_,
doc_url,
source_url,
citation,
license,
comment = NA_character_,
method,
postprocess,
authentication_note = NA_character_,
user = NA_character_,
password = NA_character_,
access_function = NA_character_,
data_group = NA_character_,
collection_size = NA,
warn_empty_auth = TRUE
)
id |
string: (required) a unique identifier of the data source. If the data source has a DOI, use that. Otherwise, if the original data provider has an identifier for this dataset, that is probably a good choice here (include the data version number if there is one). The ID should be something that changes when the data set changes (is updated). A DOI is ideal for this |
name |
string: (required) a unique name for the data source. This should be a human-readable but still concise name |
description |
string: a plain-language description of the data source, provided so that users can get an idea of what the data source contains (for full details they can consult the |
doc_url |
string: (required) URL to the metadata record or other documentation of the data source |
source_url |
character vector: one or more source URLs. Required for |
citation |
string: (required) details of the citation for the data source |
license |
string: (required) description of the license. For standard licenses (e.g. creative commons) include the license descriptor ("CC-BY", etc) |
comment |
string: comments about the data source. If only part of the original data collection is mirrored, mention that here |
method |
list (required): a list object that defines the function used to synchronize this data source. The first element of the list is the function name (as a string or function). Additional list elements can be used to specify additional parameters to pass to that function. Note that |
postprocess |
list: each element of |
authentication_note |
string: if authentication is required in order to access this data source, make a note of the process (include a URL to the registration page, if possible) |
user |
string: username, if required |
password |
string: password, if required |
access_function |
string: can be used to suggest to users an appropriate function to read these data files. Provide the name of an R function or even a code snippet |
data_group |
string: the name of the group to which this data source belongs. Useful for arranging sources in terms of thematic areas |
collection_size |
numeric: approximate disk space (in GB) used by the data collection, if known. If the data are supplied as compressed files, this size should reflect the disk space used after decompression. If the data_source definition contains multiple source_url entries, this size should reflect the overall disk space used by all combined |
warn_empty_auth |
logical: if |
The method
parameter defines the handler function used to synchronize this data source, and any extra parameters that need to be passed to it.
Parameters marked as "required" are the minimal set needed to define a data source. Other parameters are either not relevant to all data sources (e.g. postprocess
, user
, password
) or provide metadata to users that is not strictly necessary to allow the data source to be synchronized (e.g. description
, access_function
, data_group
). Note that three of the "required" parameters (namely citation
, license
, and doc_url
) are not strictly needed by the synchronization code, but are treated as "required" because of their fundamental importance to reproducible science.
See vignette("bowerbird")
for more examples and discussion of defining data sources.
a tibble with columns as per the function arguments (excluding warn_empty_auth
)
bb_config
, bb_sync
, vignette("bowerbird")
## a minimal definition for the GSHHG coastline data set:
my_source <- bb_source(
id = "gshhg_coastline",
name = "GSHHG coastline data",
doc_url = "http://www.soest.hawaii.edu/pwessel/gshhg",
citation = "Wessel, P., and W. H. F. Smith, A Global Self-consistent, Hierarchical,
High-resolution Shoreline Database, J. Geophys. Res., 101, 8741-8743, 1996",
source_url = "ftp://ftp.soest.hawaii.edu/gshhg/",
license = "LGPL",
method = list("bb_handler_rget",level = 1, accept_download = "README|bin.*\\.zip$"))
## a more complete definition, which unzips the files after downloading and also
## provides an indication of the size of the dataset
my_source <- bb_source(
id = "gshhg_coastline",
name = "GSHHG coastline data",
description = "A Global Self-consistent, Hierarchical, High-resolution Geography Database",
doc_url = "http://www.soest.hawaii.edu/pwessel/gshhg",
citation = "Wessel, P., and W. H. F. Smith, A Global Self-consistent, Hierarchical,
High-resolution Shoreline Database, J. Geophys. Res., 101, 8741-8743, 1996",
source_url = "ftp://ftp.soest.hawaii.edu/gshhg/*",
license = "LGPL",
method = list("bb_handler_rget", level = 1, accept_download = "README|bin.*\\.zip$"),
postprocess = list("bb_unzip"),
collection_size = 0.6)
## define a data repository configuration
cf <- bb_config("/my/repo/root")
## add this source to the repository
cf <- bb_add(cf, my_source)
## Not run:
## sync the repo
bb_sync(cf)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.