lazyarray | R Documentation |
lazyarray
instanceCreates or load a lazyarray
that stores data on the hard
disks. The data content is load on demand.
lazyarray(
path,
dim,
read_only = FALSE,
type = c("filearray", "fstarray"),
storage_format = c("double", "integer", "complex", "character"),
meta_name = "lazyarray.meta"
)
fstarray(
path,
dim,
read_only = FALSE,
storage_format = c("double", "integer", "complex", "character"),
meta_name = "lazyarray.meta"
)
filearray(
path,
dim,
read_only = FALSE,
storage_format = c("double", "integer"),
meta_name = "lazyarray.meta"
)
as.lazymatrix(x, ...)
as.lazyarray(x, path, type = "filearray", ...)
path |
path to a local drive where array data should be stored |
dim |
integer vector, dimension of array, see |
read_only |
whether created array is read-only |
type |
the back-end implementation of the array; choices are
|
storage_format |
data type, choices are |
meta_name |
header file name, default is |
x |
An R matrix or array |
... |
passed into |
The function lazyarray()
can either create or load an array
on the hard drives. When path
exists as a directory, and there is
a valid array instance stored, lazyarray
will ignore other parameters
such as storage_format
, type
, and sometimes dim
(see
Section "Array Partitions"). The function will try to load the existing array
given by the descriptive meta file. When path
is missing or there is
no valid array files inside of the directory, then a new array will be
spawned, and path
will be created automatically if it is missing.
There are two back-end implementations for lazyarray()
:
"filearray"
and "fstarray"
. You can use type
to
specify which implementation serves your needs. There are some differences
between these two types. Each one has its own strengths and weaknesses.
Please see Section "Array Types" for more details.
The argument meta_name
specifies the name of file which stores
all the attribute information such as the total dimension, partition size,
file format, and storage format etc. There could be multiple meta files for
the same array object; see Section "Array Partitions" for details.
An R6
class of lazyarray
. The class name is either
FstArray
or FileArray
, depending on type
specified.
Both inherit AbstractLazyArray
.
Type filearray
stores data in its binary form "as-is" to the local
drives. This format is compatible with the package filematrix
.
The data types supported are integers and double-float numbers.
Type fstarray
stores data in fst
format defined by the
package fstcore
using 'ZSTD' compression technique. Unlike
filearray
, fstarray
supports complex numbers and string
characters in addition to integer and double numbers.
The performance on solid-state drives mounted on 'NVMe' shows
filearray
can reach up to 3 GB per second for reading speed and
fstarray
can reach up to 1 GB per second.
By default, filearray
will be used if the storage format is supported,
and fstarray
is the back-up option. However, if the array data is
structured or ordered, or the storage size is a major concern,
fstarray
might achieve a better performance because it compresses
data before writing to hard drive.
To explicitly create file array, use the function filearray()
.
Similarly, use fstarray()
to create fst
-based array.
A lazyarray
partitions data in two ways: file partitions and in-file
blocks.
1. File-level Partition:
The number of file partitions matches with the last array margin.
Given a 100 x 200 x 30 x 4
array, there will be 4 partitions, each
partition stores a slice of data containing a 100 x 200 x 30
sub-array, or 2,400,000
elements.
Once an array is created, the length of each partition does not change
anymore. However, the shape of each partition can be changed. The number of
partitions can grow or trim. To change these, you just need to create a
new meta file and specify the new dimension at no additional cost. Use
the previous example. The partition sub-dimension can be
10000 x 60
, 2000 x 300
, or 1000 x 200 x 3
as
long as the total length matches. The total partitions can change to
3, 5, or 100, or any positive integer. To change the total dimension to
2400000 x 100
, you can call lazyarray
with the new dimension (
see examples). Please make sure the type
and meta_name
are
specified.
2. In-file Blocks:
Within each file, the data are stored in blocks. When reading the data, if an element within each block is used, then the whole block gets read.
For filearray
, the block size equals to the first margin. For
example, a 100 x 200 x 3
file array will have 3 file partitions,
200 blocks, each block has 100 elements
As for fstarray
, the lower bound of block size can be set by
options(lazyarray.fstarray.blocksize=...)
. By default, this number is
16,384. For a 100 x 200 x 3
array, each partition only has one block
and block number if 20,000.
If there is a dimension that defines the unit of analysis, then make it the last margin index. If a margin is rarely indexed, put it in the first margin. This is because indexing along the last margin is the fastest, and indexing along the first margin is the slowest.
If x
has 200 x 200 x 200
dimension, x[,,i]
is the
fastest, then x[,i,]
, then x[i,,]
.
Zhengjia Wang
library(lazyarray)
path <- tempfile()
# ---------------- case 1: Create new array ------------------
arr <- lazyarray(path, storage_format = 'double', dim = c(2,3,4))
arr[] <- 1:24
# Subset and get the first partition
arr[,,1]
# Partition file path (total 4 partitions)
arr$get_partition_fpath()
# Removing array doesn't clear the data
rm(arr); gc()
# ---------------- Case 2: Load from existing directory ----------------
# Load from existing path, no need to specify other params
arr <- lazyarray(path, read_only = TRUE)
summary(arr, quiet = TRUE)
# ---------------- Case 3: Import from existing data ----------------
# Change dimension to 6 x 20
arr1 <- lazyarray(path, dim = c(6,20), meta_name = "arr_6x20.meta")
arr1[,1:5]
arr1[,1:6] <- rnorm(36)
# arr also changes
arr[,,1]
# ---------------- Case 4: Converting from R arrays ----------------
x <- matrix(1:16, 4)
x <- as.lazymatrix(x, type = 'fstarray', storage_format = "complex")
x[,] # or x[]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.