r6_storage_s3: s3 storage

r6_storage_s3R Documentation

s3 storage

Description

A s3 storage with flexible file format (default rds). The data format defines the data chunks per file. The data is cached locally. This local cache can be used as local storage. For performance enhancement it is recommended after acquiring all needed data from s3 to use the cache as local storage. See storage_local_rds() for more Informations.

Usage

storage_s3_rds(
  name,
  format,
  bucket,
  prefix = NULL,
  region = NULL,
  read.only = TRUE
)

Arguments

name

name of the store

format

data format of the store

bucket

name in aws s3

prefix

in aws s3

region

aws region

read.only

read only store. disables put, upload

Value

R6 class object of storage_s3

Fields

name

name of the store

format

data format of the store

bucket

s3 bucket containing the store

region

aws region of the bucket

prefix

of the s3 object keys. see Object Key and Metadata

path

local root of the store

data_path

local root of all chunks

data_s3

s3 root key of all chunks

content_path

local path to the rds file containing statistics of store content

content_s3

s3 object key to the rds file containing statistics of store content

columns_path

local path to the rds file containing the exact column types of the store content

columns_s3

s3 object key to the rds file containing the exact column types of the store content

meta_path

local root of all meta data files

meta_s3

s3 root key of all meta data files

read.only

flag for read.only usage of store. Default TRUE

ext

file extension for chunks. Default "rds"

read_function

function(file) for reading chunks from disk. Default base::readRDS()

write_function

function(object, file) for writing chunks to disk. Default base::saveRDS()

Methods

⁠$get(filter=NULL, ...)⁠ get data from the store. The name of the arguments depend on the format. The filter argument is applied to each chunk.

⁠$download(...)⁠ downloads data from s3 to the local cache. The dots arguments are used to filter the output of list_chunks(). Only the matching rows will be downloaded.

⁠$put(data)⁠ puts the data into the store. Stops if store is read only

⁠$upload()⁠ uploads content, meta data and all new and changed chunks to the s3 storage. For big additions to the store the recommend way is to use a local storage to modify the cache and use this function to apply the change.

⁠$get_content()⁠ returns a tibble with the amount of data points per chunk per series

⁠$list_chunks()⁠ get list of all chunks in s3 and local

⁠$get_meta(key=NULL)⁠ get meta data. If key is omitted returns all the content of all files in a named list of tibbles, with the file name without extension as name. If key is supplied as argument only the list contains only the specified key.

⁠$put_meta(...)⁠ puts meta data into the store. the name of the argument is used as file name and the value as data.

⁠$get_local_storage()⁠ returns a storage to work with the cached data like a local storage

⁠$destroy(confirmation)⁠ removes all files under path from the file system if "DELETE" is supplied as confirmation

Authentication

See the documentation of aws.signature for ways to provide the necessary informations. The simplest way is to use environment variables defined in a .Renviron file in the root directory of a RStudio Project:

AWS_ACCESS_KEY_ID = "xxxxxxxxxxxxxxxxxx"
AWS_SECRET_ACCESS_KEY = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
AWS_DEFAULT_REGION = "eu-central-1"

Examples

## init store, creates directory if necessary
format <- rOstluft::format_rolf()
store <- rOstluft::storage_s3_rds("s3_example", format, "rostluft", prefix = "aqmet")

## get all data min30 for 2011 and 2012
store$get(site = "Zch_Stampfenbachstrasse", interval = "min30", year = 2011:2012)

## get only data for O3
store$get(year = 2011:2012, site = "Zch_Stampfenbachstrasse", interval = "min30",
          filter = parameter == "O3")

## get NOx data from multiple stations
store$get(site = c("Zch_Stampfenbachstrasse", "Zch_Rosengartenstrasse"), interval = "min30",
          year = 2014, filter = parameter %in% c("NOx", "NO", "NO2"))

## get n data points grouped by intervall, station, parameter, year in the store
store$get_content()

## get list of all chunks, show only local files
dplyr::filter(store$list_chunks(), !is.na(local.path))

## download all data for site Zch_Rosengartenstrasse before 2016
store$download(site == "Zch_Rosengartenstrasse", year < 2016)

## now there should be 2 more local files
dplyr::filter(store$list_chunks(), !is.na(local.path))

## get all meta data
store$get_meta()

## or a specific meta file
store$get_meta("ostluft")

## get the cache as local storage
local <- store$get_local_storage()
local$list_chunks()

## get cached data
local$get(site = "Zch_Stampfenbachstrasse", interval = "min30", year = 2011:2012)

## destroy store (careful removes all files on the disk)
store$read.only = FALSE
store$destroy("DELETE")

## No examples for write operations


Ostluft/rOstluft documentation built on Feb. 6, 2024, 1:26 a.m.