Tensor: R6 Class for large Tensor (Array) in Hybrid Mode

Description Public fields Active bindings Methods Examples

Description

can store on hard drive, and read slices of GB-level data in seconds

Public fields

dim

dimension of the array

dimnames

dimension names of the array

use_index

whether to use one dimension as index when storing data as multiple files

hybrid

whether to allow data to be written to disk

last_used

timestamp of the object was read

temporary

whether to remove the files once garbage collected

Active bindings

varnames

dimension names (read-only)

read_only

whether to protect the swap files from being changed

swap_file

file or files to save data to

Methods

Public methods


Method finalize()

release resource and remove files for temporary instances

Usage
Tensor$finalize()

Method print()

print out the data dimensions and snapshot

Usage
Tensor$print(...)
Arguments
...

ignored

Returns

self


Method .use_multi_files()

Internally used, whether to use multiple files to cache data instead of one

Usage
Tensor$.use_multi_files(mult)
Arguments
mult

logical


Method new()

constructor

Usage
Tensor$new(
  data,
  dim,
  dimnames,
  varnames,
  hybrid = FALSE,
  use_index = FALSE,
  swap_file = tempfile(),
  temporary = TRUE,
  multi_files = FALSE
)
Arguments
data

numeric array

dim

dimension of the array

dimnames

dimension names of the array

varnames

characters, names of dimnames

hybrid

whether to enable hybrid mode

use_index

whether to use the last dimension for indexing

swap_file

where to store the data in hybrid mode files to save data by index

temporary

whether to remove temporary files when existing

multi_files

if use_index is true, whether to use multiple


Method subset()

subset tensor

Usage
Tensor$subset(..., drop = FALSE, data_only = FALSE, .env = parent.frame())
Arguments
...

dimension slices

drop

whether to apply drop on subset data

data_only

whether just return the data value, or wrap them as a Tensor instance

.env

environment where ... is evaluated

Returns

the sliced data


Method flatten()

converts tensor (array) to a table (data frame)

Usage
Tensor$flatten(include_index = FALSE, value_name = "value")
Arguments
include_index

logical, whether to include dimension names

value_name

character, column name of the value

Returns

a data frame with the dimension names as index columns and value_name as value column


Method to_swap()

Serialize tensor to a file and store it via write_fst

Usage
Tensor$to_swap(use_index = FALSE, delay = 0)
Arguments
use_index

whether to use one of the dimension as index for faster loading

delay

if greater than 0, then check when last used, if not long ago, then do not swap to hard drive. If the difference of time is greater than delay in seconds, then swap immediately.


Method to_swap_now()

Serialize tensor to a file and store it via write_fst immediately

Usage
Tensor$to_swap_now(use_index = FALSE)
Arguments
use_index

whether to use one of the dimension as index for faster loading


Method get_data()

restore data from hard drive to memory

Usage
Tensor$get_data(drop = FALSE, gc_delay = 3)
Arguments
drop

whether to apply drop to the data

gc_delay

seconds to delay the garbage collection

Returns

original array


Method set_data()

set/replace data with given array

Usage
Tensor$set_data(v)
Arguments
v

the value to replace the old one, must have the same dimension

notice

the a tensor is an environment. If you change at one place, the data from all other places will change. So use it carefully.


Method collapse()

apply mean, sum, or median to collapse data

Usage
Tensor$collapse(keep, method = "mean")
Arguments
keep

which dimensions to keep

method

"mean", "sum", or "median"

Returns

the collapsed data


Method operate()

apply the tensor by anything along given dimension

Usage
Tensor$operate(
  by,
  fun = .Primitive("/"),
  match_dim,
  mem_optimize = FALSE,
  same_dimension = FALSE
)
Arguments
by

R object

fun

function to apply

match_dim

which dimensions to match with the data

mem_optimize

optimize memory

same_dimension

whether the return value has the same dimension as the original instance

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# Create a tensor
ts <- Tensor$new(
  data = 1:18000000, c(3000,300,20),
  dimnames = list(A = 1:3000, B = 1:300, C = 1:20),
  varnames = c('A', 'B', 'C'))

# Size of tensor when in memory is usually large
pryr::object_size(ts)
#> 8.02 MB

# Enable hybrid mode
ts$to_swap_now()

# Hybrid mode, usually less than 1 MB
pryr::object_size(ts)
#> 814 kB

# Subset data
start1 <- Sys.time()
subset(ts, C ~ C < 10 & C > 5, A ~ A < 10)
#> Dimension:  9 x 300 x 4
#> - A: 1, 2, 3, 4, 5, 6,...
#> - B: 1, 2, 3, 4, 5, 6,...
#> - C: 6, 7, 8, 9
end1 <- Sys.time(); end1 - start1
#> Time difference of 0.188035 secs

# Join tensors
ts <- lapply(1:20, function(ii){
  Tensor$new(
    data = 1:9000, c(30,300,1),
    dimnames = list(A = 1:30, B = 1:300, C = ii),
    varnames = c('A', 'B', 'C'), use_index = 2)
})
ts <- join_tensors(ts, temporary = TRUE)

dipterix/raveutils documentation built on July 6, 2020, 12:24 a.m.