README.md
In antoine-sachet/cacheR: Save Data in Git-friendly Format

cacheR

The goal of cacheR is to provide I/O functions to save data in named lists to disk in a robust and git-friendly way.

When I deploy shiny apps, I typically need to deploy some data along with it.

The alternatives I have used are: - an external DB: this can be slow and requires somehow passing or storing sensitive credentials in the app. - tabular plaintext formats such as CSV: this works fine for data.frame objects, but requires robust readers to ensure data integrity. When the data is already in a DB, it is a waste of time to export it to csv and write readers! It is also not adapted for non-tabular data or complex (e.g. nested) tabular data. - RDS format to save any kind of objects. Well, yes! For a one-off data dump, you totally could use RDS and in fact the RDS format is used extensively within cacheR. To git however, this is a binary file. If you need to regularly update your data, the git repository can grow very quickly! Using plaintext when possible leverages the delta power of git.

cacheR is a compromise: it saves data in a directory arborescence whose nodes are either RDS or plaintext files. Lists are broken down in directories/subdirectories. Atomic vectors (character, numeric, factor, logical, integer) are stored in plaintext. Other data types are stored in RDS files.

Data.frames are treated as a special case of lists. Columns can be stored in plaintext, in RDS or in subdirectories, depending on their types. This means hybrid tibbles with nested list, nested data.frames and any other non-standard column types work just fine! All attributes are preserved, so you get back exactly what you saved, including groups, row names if any, etc.

You can install the development version of cacheR from github with:

remotes::install_github("antoine-sachet/cacheR")

This is a basic example which shows you how to store and retrieve some data.

library("cacheR")

my_data <- 
  list(data = mtcars, 
       model = lm(mpg ~ gear, data = mtcars),
       details = list(date = "2030-01-01", 
                      version = "1.2"))

# Note the directory must exist
write_cache(my_data, path = "./cache")

cache <- read_cache("my_data", path = "./cache")

all.equal(my_data, cache)
# TRUE, of course!

This is an example of data where cacheR really shines.

You could not store easily as (mostly) plaintext without cacheR.

# Let's build a nested data.frame with non-standard column types.

library("cacheR")
library("dplyr")
library("tidyr")

df <- iris %>%
  group_by(Species) %>%
  nest() %>%
  mutate(model = purrr::map(data, lm, formula = Sepal.Length ~ .))

# Talk about a non-standard data.frame!
df
# # A tibble: 3 x 3
#   Species    data              model   
#   <fct>      <list>            <list>  
# 1 setosa     <tibble [50 × 4]> <S3: lm>
# 2 versicolor <tibble [50 × 4]> <S3: lm>
# 3 virginica  <tibble [50 × 4]> <S3: lm>

# Saving it in a temporary directory
path <- tempdir()
write_cache(df, path, name = "nested_iris")

# You can have a look at all the files in the cache
# Most of the data is stored in plaintext, with the exception of the `lm` models.
inspect_cache(path)

df_cached <- read_cache("nested_iris", path)

# # A tibble: 3 x 3
#   Species    data              model   
# * <fct>      <list>            <list>  
# 1 setosa     <tibble [50 × 4]> <S3: lm>
# 2 versicolor <tibble [50 × 4]> <S3: lm>
# 3 virginica  <tibble [50 × 4]> <S3: lm>

antoine-sachet/cacheR documentation built on Jan. 17, 2021, 6:40 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com