knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%", eval = TRUE )
You can install the development version of laker from GitHub with:
# install.packages("remotes") remotes::install_github("rappster/laker")
A framework for a local data lake.
More control of data pipelines.
Via data layers and systematic S3 method dispatch.
The goal is to integrate {laker}
as much as possible with {targets}
in the future
Assuming that you have a local directory that should act as your "local data lake" you can symlink it to the data
directory within your (package) project.
Mine is layered as follows
library(laker)
valid_data_layers(df = TRUE)
You can create a symbolIC link with [fs_create_symlink]
.
fs_create_symlink( original = "~/data/dev/", symlink = "data/" )
This is simply for my own convenience while "developing the thing" and I'm sorry for any annoyances it may cause for others. I'll change that in the future.
If the use case to create symbolic links is to "link to a data lake" then there is a more user-friendly wrapper around fs_create_symlink()
link_data_lake( path_data_lake = "~/data/dev", path = "datalake" )
Put an arbitrary file* in your "inbox" layer.
*As long as it's the Tableau Global Superstore data ;-)
Bear with me: it's currently the only data class that has been defined and I still didn't get around the part of describing how to define your own data classes - which is obviously the entire point of this package. We'll get there ;-)
superstore <- laker::layer_ingest( constructor = laker::data_tableau_global_superstore, version = "v1" )
What you just did is to "ingest" the original file from layer 01 (the "raw" layer) into layer 02 (the "tidy" layer) and stored it as an arrow file.
fs::dir_ls(here::here("data", "layer_01"), recurse = TRUE)
Systematically transform data as needed.
layer_curate
Taking curated data from layer 03 and making it "application-read" - whatever that means ;-)
# No generic function and methods yet :-()
TODO-2022-02-17-2327: Explain the data catalog YAML
path <- fs::path_package("laker", "data_catalog.yml") path %>% readLines() %>% cat(sep = "\n") confx::conf_get(from = path, config = "v2")
TODO-2022-02-17-2328: Explain the config YAML
path <- fs::path_package("laker", "config.yml") path %>% readLines() %>% cat(sep = "\n") confx::conf_get(from = path, config = "dev")
You can read data from arbitray layers with
Read from layer 01
superstore <- laker::layer_read( constructor = laker::data_tableau_global_superstore, layer = laker::valid_data_layers("01"), version = "v1" ) %>% dplyr::glimpse()
Read from layer 02
laker::layer_read( constructor = laker::data_tableau_global_superstore, layer = laker::valid_data_layers("02"), version = "v1" ) %>% dplyr::glimpse()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.