A DelayedArray backend for TileDB

knitr::opts_chunk$set(error=FALSE, message=FALSE, warning=FALSE)
library(BiocStyle)

Introduction

TileDB implements a framework for local and remote storage of dense and sparse arrays. We can use this as a DelayedArray backend to provide an array-level abstraction, thus allowing the data to be used in many places where an ordinary array or matrix might be used. The r Biocpkg("TileDBArray") package implements the necessary wrappers around r Githubpkg("TileDB-Inc/TileDB-R") to support read/write operations on TileDB arrays within the r Biocpkg("DelayedArray") framework.

Creating a TileDBArray

Creating a TileDBArray is as easy as:

X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)

Alternatively, we can use coercion methods:

as(X, "TileDBArray")

This process works also for sparse matrices:

Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)

Logical and integer matrices are supported:

writeTileDBArray(Y > 0)

As are matrices with dimension names:

rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)

Manipulating TileDBArrays

TileDBArrays are simply DelayedArray objects and can be manipulated as such. The usual conventions for extracting data from matrix-like objects work as expected:

out <- as(X, "TileDBArray")
dim(out)
head(rownames(out))
head(out[,1])

We can also perform manipulations like subsetting and arithmetic. Note that these operations do not affect the data in the TileDB backend; rather, they are delayed until the values are explicitly required, hence the creation of the DelayedMatrix object.

out[1:5,1:5] 
out * 2

We can also do more complex matrix operations that are supported by r Biocpkg("DelayedArray"):

colSums(out)
out %*% runif(ncol(out))

Controlling backend creation

We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray(). For example, the example below allows us to control the path to the backend as well as the name of the attribute containing the data.

X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")

As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.

path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.

Session information

sessionInfo()


Try the TileDBArray package in your browser

Any scripts or data that you put into this service are public.

TileDBArray documentation built on Nov. 8, 2020, 6:38 p.m.