library(BiocStyle)
knitr::opts_chunk$set(error=FALSE, warning=FALSE, message=FALSE)

Motivation

The r Biocpkg("chihaya") package saves DelayedArray objects for efficient, portable and stable reproduction of delayed operations in a new R session or other programming frameworks.

Check out the specification for more details.

Quick start

Make a DelayedArray object with some operations:

library(DelayedArray)
x <- DelayedArray(matrix(runif(1000), ncol=10))
x <- x[11:15,] / runif(5) 
x <- log2(x + 1)
x
showtree(x)

Save it into a HDF5 file with saveDelayed():

library(chihaya)
tmp <- tempfile(fileext=".h5")
saveDelayed(x, tmp)
rhdf5::h5ls(tmp)

And then load it back in later:

y <- loadDelayed(tmp)
y

Of course, this is not a particularly interesting case as we end up saving the original array inside our HDF5 file anyway. The real fun begins when you have some more interesting seeds.

More interesting seeds

We can use the delayed nature of the operations to avoid breaking sparsity. For example:

library(Matrix)
x <- rsparsematrix(1000, 1000, density=0.01)
x <- DelayedArray(x) + runif(1000)

tmp <- tempfile(fileext=".h5")
saveDelayed(x, tmp)
rhdf5::h5ls(tmp)
file.info(tmp)[["size"]]

# Compared to a dense array.
tmp2 <- tempfile(fileext=".h5")
out <- HDF5Array::writeHDF5Array(x, tmp2, "data")
file.info(tmp2)[["size"]]

# Loading it back in.
y <- loadDelayed(tmp)
showtree(y)

We can also store references to external files, thus avoiding data duplication:

library(HDF5Array)
test <- HDF5Array(tmp2, "data")
stuff <- log2(test + 1)
stuff

tmp <- tempfile(fileext=".h5")
saveDelayed(stuff, tmp)
rhdf5::h5ls(tmp)
file.info(tmp)[["size"]] # size of the delayed operations + pointer to the actual file

y <- loadDelayed(tmp)
y

Session information {-}

sessionInfo()


LTLA/DelayedArraySaver documentation built on Oct. 11, 2023, 1:33 p.m.