search_datalake: Search for keys in the data lake

View source: R/datalake.R

search_datalakeR Documentation

Search for keys in the data lake

Description

Returns metadata about objects in a bucket. This function is a wrapper to 'aws.s3::get_bucket_df' but filters out the desired Keys. If there are more than 1000 objects, it makes iterative calls to the AWS S3 API to retrieve the metadata for all versions. If the function is called from an interactive session, it invokes a data viewer (View) with the search results.

Usage

search_datalake(..., bucket_name = mfe_datalake_bucket,
  object_versions = FALSE, ncores = 1)

Arguments

...

Patterns to look for. Each argument can be a character string or a regex pattern. If multiple arguments are passed only Keys that match all patterns are returned. Strings are passed to coll and it ignores whether it is lower or upper case. If you want to search using regex construct the pattern using regex (see examples).

bucket_name

Name of the bucket to connect. By default, it uses the Ministry for the Environment data lake for environmental reporting "mfedlkinput".

object_versions

Logical. Whether to include object version IDs in the search

Value

a data frame with metadata for selected objects

Examples

 ## Not run: 
# return all objects
search_datalake()
# search for a word
search_datalake("temperature")
# search using regex
search_datalake(stringr::regex("^a"))
# search tidy datasets for atmosphere and climate 2020
search_datalake("tidy", "climate", "2020")
# search tidy datasets with versions for atmosphere and climate 2020
search_datalake("tidy", "climate", "2020", object_versions = TRUE, ncores = 4)


## End(Not run)

StatisticsNZ/er.helpers documentation built on Oct. 2, 2023, 7:24 a.m.