DSArray-class: Duplicate slice arrays

Description Usage Arguments Details Slots Design and Internals Supported Types API and Supported Methods Other methods Author(s) See Also

Description

The DSArray class provides compressed storage of 3-dimensional arrays when the array has many duplicate slices. A basic array-like API is provided for instantiating, subsetting, and combining DSArray objects.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
## S4 method for signature 'DSArray'
dim(x)

## S4 method for signature 'DSArray'
nslice(x)

## S4 method for signature 'DSArray'
length(x)

## S4 method for signature 'DSArray'
dimnames(x)

## S4 method for signature 'DSArray'
slicenames(x)

## S4 replacement method for signature 'DSArray,'NULL''
dimnames(x) <- value

## S4 replacement method for signature 'DSArray,list'
dimnames(x) <- value

## S4 replacement method for signature 'DSArray,'NULL''
slicenames(x) <- value

## S4 replacement method for signature 'DSArray,character'
slicenames(x) <- value

## S4 method for signature 'DSArray'
x[i, j, k, ..., drop = FALSE]

## S4 replacement method for signature 'DSArray,ANY,ANY,DSArray'
x[i, j, k, ...] <- value

## S4 method for signature 'DSArray'
acbind(...)

## S4 method for signature 'DSArray'
arbind(...)

## S4 method for signature 'DSArray'
densify(x)

## S4 method for signature 'DSArray'
show(object)

Arguments

x, object, value

A DSArray object

i, j, k

Indices specifying elements to extract or replace. Indices are numeric or character vectors or empty (missing). i indexes rows, j indexes columns, and k indexes slices. i, j, and k can be logical vectors, indicating elements/slices to select. Such vectors are recycled if necessary to match the corresponding extent. Indexing by negative values, matrix i, or NULL indices are not currently implemented.

...

DSArray objects. For arbind, the ncol and nslice of all objects must match, but the nrow may differ. For acbind, the nrow and nslice of all objects must match, but the ncol may differ.

drop

Currently ignored

Details

Suppose we have a 3-dimensional array, x, with dimensions indexed by i (rows), j (columns), and k (slices). We refer to x[i, j, ] as an (i,j)-slice (here length(i) == length(j) == 1). For certain data it is the case that many of the (i,j)-slices of x are repeated or duplicated multiple times. These data can be more efficiently stored by retaining only those unique (i,j)-slices of x and creating a map between the original data and these unique (i,j)-slices. This is what the DSArray class implements. Of course, a DSArray representation of x is only worthwhile if x contains many such duplicate (i,j)-slices.

The DSArray class was initially conceived for use as an element of a Assays object in the assays slot of a SummarizedExperiment object. Therefore i indexes rows (features/ranges), j indexes columns (samples), and k indexes slices. Importantly, the aim is to have the DSArray version of x behave from the user's perspective just as if it were in its "dense" form.

Slots

key

An integer matrix where the (i, j)-entry of the key corresponds to the i^{th} row and j^{th} column of the original 3-dimensional "dense" array.

val

A matrix storing the unique slices of the input array.

Design and Internals

Let x be a 3-dimensional array and let dsa be its DSArray representation. A duplicate (i,j)-slice of x is one such that identical(x[i1, j1, ], x[i2, j2, ]) returns TRUE with at least one of i1 != i2 or j1 != j2. dsa stores the unique (i,j)-slices of x as a matrix (slot(dsa, "val")) and an integer matrix (slot(dsa, "map")) mapping the (i, j)-slice of x to a row of slot(dsa, "val").

As noted above, the DSArray representation of x is only worthwhile if x contains many duplicate (i,j)-slices since this ensures that nrow(val) is much smaller than nrow(x). Furthermore, the DSArray representation of x becomes proportionally more efficient as the number of slices (dim(x)[3]) increases. For a fixed nrow(x), the relative efficiency of DSArray(x) compared to x increases linearly in the proportion of duplicate (i,j)-slices. More specifically, the relative memory usage of DSArray(x) compared to x is proportional to: 4 / (dim(x)[3] * s) + sum(duplicated(apply(x, 3, I))) / (nrow(x) * ncol(x)) where s = 4 for integer arrays and s = 8 for numeric arrays. Note that this means if dim(x)[3] < 2 then DSArray(x) always uses more memory than x.

The maximum number of rows of a DSArray object is currently .Machine$integer.max, approximately 2.1 billion rows on a 64-bit machine.

Supported Types

R supports logical, integer, double (often called numeric), character, complex, and raw arrays. The DSArray class currently supports all these types except complex and raw.

API and Supported Methods

It is intended that a DSArray object behaves much as if it were a array object. Common operations such arithmetic (e.g., `+`, `*`), comparison (e.g., ==, <), and mathematical transformations (e.g., log(), sin()) are all supported; see DSArray-utils for a full list and details.

However, not all operations that are well-defined for array objects are currently implemented for DSArray objects (e.g., mean()). I plan to implement these as needed, so if you come across one that you would like to have, then please file a feature request at https://github.com/PeteHaitch/DSArray/issues.

Other methods

show(x): By default the show method only displays the class of the object and its dimensions. However, if the HDF5Array package is installed, then the show method also displays the first and last few rows of the object.

Author(s)

Peter Hickey

See Also

DSArray, DSArray-utils


PeteHaitch/DSArray documentation built on May 8, 2019, 1:30 a.m.