DESIGN.md
In decisionpatterns/cache: Intuitive and Comprehensive Cache

DESIGN.md

This document records design decisions for the cache package.skeleton

The cache package allows the user to save and restore files unambiguously and robustly with only a few simple commands regardless of the backend mechanism for doing so. The files are written to a known location on the file system with easily recognizable names and extensions. It hides almost all the complexity of saving and restoring data. *

Abstracted complexities of persistent/storage.
Single persistence interface.
Standarized persistence of data
Save and restore data by only very limited number of methods with simple, intuitive commands of few arguments, if any.
Plugable, extensible backends.
Uniqueness of object (names) across all storage backends.
Database-like coupling to R with multiple storage backends, similar to the way MySQL/MariaDB uses various storage engines.
[-tk] Metadata (dim,length,etc) available without reading the files.
Handles all types of objects

Create pragmas for persisting data
Hide complexity of paths and naming; use standard defaults or set once
Provide mechanism for all I/O and saving
Replace save
Extends function of fs package
[-tk] Provide metadata data
[-tk] Create infrastructure for update/sync of data
[-tk] Create infrastructure for supporting models esp. offline features.

The goal of cache is to have the developer or program work only with the names and not worry about paths, extension, i/o, etc. This is done by enforcing a uniqueness

Examples:

 cache_create(...)  # create a cache 
 cache_path()       # show where the cache is located

 # Standard Evaluation
 cache_write(iris, name="iris")
 cache_read('iris')

 # Non-Standard Evaluation / Interactive
 cache(x)
 uncache(x)

cache works by:

Standardizing the storage location of objects to a single directory: [PROJ_ROOT]/cache
Registering backends that map file extensions to read/write method
Enforcing uniqueness based on the backends.

When you write to the cache the following happens:

Backend and ext for the object. Look in:
[-tk] MANIFEST
Existing files () ...
IN DEFAULT BACKEND
Create path
Write using writer

Given a name of an object, the file or path will be:

Any existing object with the names and the backend extensions

This affectes cached_file, cached_path etc.

/somepath/to/somefile.ext |-----------|/|file(name)

path - generic path to directory or file directory - abs or relative path to directory file(name) - name of file including the extension basename - name of the file excluding the extension extension - end of filename (preceeded by the period)

fs_path, fs_ext

name (object name) in the language

 R object (w/ name <==> filename <==> path cache_path()/cache_filename(object)
 fs:       filename <==> fs_path <==> fs_path 
 cache:
   =>      as_cached_name => add_extension => check_conflicts => path( cache_path, . )

   <=      as_cached_name => add_extension => check

 cache(name) => path+ext => path( cache_path, . ) => ...
 uncache(name) =>

 backend: 
  - name
  - object (i.e. list)

There are also "potential files" ... those that don't exist

cache_
backend_

The packages uses the single-accessor style. Rather than have seperate get_ and set_ methods, one method is used. These functions getter unless and argument is supplied. This may change in the future.

The problem with the single accessor method is that it only works well on setting/getting simple attributes such as those that are just a string.

In order to work properly, cache enforces uniqueness of objects. Since, there can only object for a given name in memory, there can only be one file associated with that object. This is mananged associated backends to file extensions. For example, my_data.tsv and my_data.rds cannot both simultaneously exist on the file system since the call to uncache(my_data) wouldn't know whether to read the .tsv or the .rds file. It promotes the user to adequately name their objects.

Note: Is this strictly necessary? - What if we allowed multiple files? - What if this was optional?

There can be only one object with a given name regardless of the methods of storage.

Why?

cache(iris) uncache('iris')

Where possible, cache

mtcars %>% cache %>% nrows

All cache and uncache routines should preserve data types and classes.

A backend is much like a storage driver or an ODBC data source. It tells how to store and retrieve data but relies on the package functions to execute the backend's functions.

Files are more like individual extensions. Writing and files should be uniquely described by their file extension. This might be true of other repositories as well, e.g. 'my_file.sql'

Files should contain the cached data, but may also contain metadata about how the information was retrieved or how it could be updated.

cache.backends
cache.default

A fact table is characterized by a record containing a index (usually time). To update:

Get max of index
Delete everything >= index.
Retrieve records >= index.
Append records
Save.

A dimension table is often small and an update is a full collections

Get entire SCD if feasible
Get based on SCD Effective Column

Often the microcosm will be based on a sampled set of records from a given dimension, such as users. The microcosm should use this to filter all other records.

cache: path to directory
name: file basename of object
ext: file/path extension of object
backend: list of elements that fully encapsulate writing and retrieving of an object
reader: backend-specific function for reading cached object
writer: backend-specific function for writing cached object.

FAQ

Q: Why not just use a redis server? A: Locality. Abtrary R objects. Etc.

Appendix

`write`/`read` vs. `save`/`load` vs. `cache`/`uncache`

This package does not delineates the difference between write/read`` andsave/load`. The existing distinctions distinction seems to be that:

save and load refers to storing as a binary/non-editable file.
write and read refers to storing as a human-readable file.
cache/uncache referes to a universal storing of an object regardless of the persitence format.

decisionpatterns/cache documentation built on June 15, 2020, 9:35 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

decisionpatterns/cache
Intuitive and Comprehensive Cache

DESIGN.md
In decisionpatterns/cache: Intuitive and Comprehensive Cache

DESIGN.md

Introduction

Features

Goal(s)

Cache usage

How it works

Writing to the cache

Identifying backend name, file or path

Diagram

Relations of Things

Method/function prefixes

Accessor method

Uniqueness

cache uses NSE / uncache uses SE

Functional and non-functional interface(s)

pipe-able

Type-preserving

backend

files

options

Update: Fact table

Update: Dimension Table

Update: Slowly Changing Dimension

Dimension

Terms/Objects

FAQ

Appendix

`write`/`read` vs. `save`/`load` vs. `cache`/`uncache`

R Package Documentation

Browse R Packages

We want your feedback!

decisionpatterns/cache Intuitive and Comprehensive Cache

DESIGN.md In decisionpatterns/cache: Intuitive and Comprehensive Cache

DESIGN.md

Introduction

Features

Goal(s)

Cache usage

How it works

Writing to the cache

Identifying backend name, file or path

Diagram

Relations of Things

Method/function prefixes

Accessor method

Uniqueness

cache uses NSE / uncache uses SE

Functional and non-functional interface(s)

pipe-able

Type-preserving

backend

files

options

Update: Fact table

Update: Dimension Table

Update: Slowly Changing Dimension

Dimension

Terms/Objects

FAQ

Appendix

write/read vs. save/load vs. cache/uncache

R Package Documentation

Browse R Packages

We want your feedback!

decisionpatterns/cache
Intuitive and Comprehensive Cache

DESIGN.md
In decisionpatterns/cache: Intuitive and Comprehensive Cache

`write`/`read` vs. `save`/`load` vs. `cache`/`uncache`