hybrid_array: Create a Hybrid Array

Description Usage Arguments Details Examples

Description

These functions create hybrid array instances for fast read/write process for arrays with three or more dimensions. When the data is too large for RAM, it's recommended to partition and store data on the local hard disks. Hybrid array partitions the data along one of its dimensions.

Usage

1
2
3
4
5
6
hybrid_array(data = NA, dim = length(data), dimnames = NULL,
  path = tempfile(pattern = "hybridarray"), partition_index = NULL)

hybrid_array_partial(data, dim, which_partition,
  partition_index = length(dim), dimnames = base::dimnames(data),
  path = tempfile(pattern = "hybridarray"))

Arguments

data

array or an atomic element

dim

dimension of data

dimnames

NULL or named list of data dimensions

path

path to store array

partition_index

which dimension to create partition

which_partition

which partition should data be when calling hybrid_array_partial

Details

When the array is too large for RAM to handle, use hybrid_array_partial. For example, a 1000 x 1000 x 100 x 100 array could be ~ 80GB which could be too big for a personal laptop to handle in RAM. To solve this problem, we could generate 1000 sub-arrays with dimension 1 x 1000 x 100 x 100, with each ~ 80 MB. To start, we use hybrid_array_partial(..., which_partition=1) to claim the first dimension to be the partition index, then push sub-arrays. (see example - "partial data")

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# ------------ Simple in-memory usage ------------
data <- rnorm(1e5)
x <- hybrid_array(data, c(100, 100, 10))
x[]

# ------------ partial data example ------------
# generate a 10 x 10 x 3 x 100 array x, but only with partial data

# the second partition
data = array(rnorm(10000), c(10, 10, 1, 100))

# x = array(NA, c(10, 10, 3, 100)); x[,,2,] <- data
x = hybrid_array_partial(data, dim = c(10,10,3,100), partition_index = 3, which_partition = 2)
x[,,2,]

# Add more data
x[,,3,] <- data + 1

# Check, should be all '1'
x[1,1,3,] - x[1,1,2,]

## Not run: 
# ------------ Hybrid example ~ 800MB data ------------
data <- rnorm(1e8)
x <- hybrid_array(data, c(100, 100, 100, 100))
x$ram_used
# save to disk, might take a while to write to disk
x$swap_out(); x$ram_used
x[1,2,1:10,2]

## End(Not run)

dipterix/harray documentation built on May 13, 2019, 12:29 a.m.