The DSArray
class provides compressed storage of 3dimensional
arrays when the array has many duplicate slices. A basic arraylike API is
provided for instantiating, subsetting, and combining DSArray objects.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44  ## S4 method for signature 'DSArray'
dim(x)
## S4 method for signature 'DSArray'
nslice(x)
## S4 method for signature 'DSArray'
length(x)
## S4 method for signature 'DSArray'
dimnames(x)
## S4 method for signature 'DSArray'
slicenames(x)
## S4 replacement method for signature 'DSArray,'NULL''
dimnames(x) < value
## S4 replacement method for signature 'DSArray,list'
dimnames(x) < value
## S4 replacement method for signature 'DSArray,'NULL''
slicenames(x) < value
## S4 replacement method for signature 'DSArray,character'
slicenames(x) < value
## S4 method for signature 'DSArray'
x[i, j, k, ..., drop = FALSE]
## S4 replacement method for signature 'DSArray,ANY,ANY,DSArray'
x[i, j, k, ...] < value
## S4 method for signature 'DSArray'
acbind(...)
## S4 method for signature 'DSArray'
arbind(...)
## S4 method for signature 'DSArray'
densify(x)
## S4 method for signature 'DSArray'
show(object)

x, object, value 
A DSArray object 
i, j, k 
Indices specifying elements to extract or replace. Indices are

... 
DSArray objects. For 
drop 
Currently ignored 
Suppose we have a 3dimensional array, x
, with dimensions indexed by
i
(rows), j
(columns), and k
(slices). We refer
to x[i, j, ]
as an (i,j)
slice (here
length(i) == length(j) == 1
). For certain data it is the case that
many of the (i,j)
slices of x
are repeated or duplicated
multiple times. These data can be more efficiently stored by retaining only
those unique (i,j)
slices of x
and creating a map
between the original data and these unique (i,j)
slices. This is
what the DSArray class implements. Of course, a DSArray representation of
x
is only worthwhile if x
contains many such duplicate
(i,j)
slices.
The DSArray class was initially conceived for use as an element of a
Assays object in the assays
slot of a
SummarizedExperiment object. Therefore i
indexes rows (features/ranges), j
indexes columns (samples), and
k
indexes slices. Importantly, the aim is to have the DSArray version
of x
behave from the user's perspective just as if it were in its
"dense" form.
key
An integer matrix where the (i, j)entry of the key
corresponds to the i^{th} row and j^{th} column of the original
3dimensional "dense" array.
val
A matrix storing the unique slices of the input array.
Let x
be a 3dimensional array and let dsa
be its DSArray
representation. A duplicate (i,j)
slice of x
is one such that
identical(x[i1, j1, ], x[i2, j2, ])
returns TRUE
with at
least one of i1 != i2
or j1 != j2
. dsa
stores the
unique (i,j)
slices of x
as a matrix
(slot(dsa, "val")
) and an integer matrix (slot(dsa, "map")
)
mapping the (i, j)
slice of x
to a row of
slot(dsa, "val")
.
As noted above, the DSArray representation of x
is only worthwhile if
x
contains many duplicate (i,j)
slices since this ensures
that nrow(val)
is much smaller than nrow(x)
. Furthermore, the
DSArray representation of x
becomes proportionally more efficient as
the number of slices (dim(x)[3]
) increases. For a fixed
nrow(x)
, the relative efficiency of DSArray(x)
compared to
x
increases linearly in the proportion of duplicate
(i,j)
slices. More specifically, the relative memory
usage of DSArray(x)
compared to x
is proportional to:
4 / (dim(x)[3] * s) +
sum(duplicated(apply(x, 3, I))) / (nrow(x) * ncol(x))
where s = 4
for integer
arrays and s = 8
for
numeric
arrays. Note that this means if
dim(x)[3] < 2
then DSArray(x)
always uses more memory than
x
.
The maximum number of rows of a DSArray object is currently
.Machine$integer.max
, approximately 2.1 billion rows on a 64bit
machine.
R supports logical
, integer
,
double
(often called numeric
),
character
, complex
, and
raw
arrays. The DSArray class currently supports all
these types except complex
and raw
.
It is intended that a DSArray object behaves much as if it were a
array object. Common operations such arithmetic (e.g.,
`+`
, `*`
), comparison (e.g., ==
, <
), and
mathematical transformations (e.g., log()
, sin()
) are all
supported; see DSArrayutils for a full list and details.
However, not all operations that are welldefined for array
objects are currently implemented for DSArray objects (e.g., mean()
).
I plan to implement these as needed, so if you come across one that you
would like to have, then please file a feature request at
https://github.com/PeteHaitch/DSArray/issues.
show(x)
: By default the show
method only displays the class of
the object and its dimensions. However, if the HDF5Array package is
installed, then the show
method also displays the first and last few
rows of the object.
Peter Hickey
