filearray: Create or load existing file arrays

as_filearrayR Documentation

Create or load existing file arrays

Description

Create or load existing file arrays

Usage

as_filearray(x, ...)

as_filearrayproxy(x, ...)

filearray_create(
  filebase,
  dimension,
  type = c("double", "float", "integer", "logical", "raw", "complex"),
  partition_size = NA,
  initialize = FALSE,
  ...
)

filearray_load(filebase, mode = c("readwrite", "readonly"))

filearray_checkload(
  filebase,
  mode = c("readonly", "readwrite"),
  ...,
  symlink_ok = TRUE
)

filearray_load_or_create(
  filebase,
  dimension,
  on_missing = NULL,
  type = NA,
  ...,
  mode = c("readonly", "readwrite"),
  symlink_ok = TRUE,
  initialize = FALSE,
  partition_size = NA,
  verbose = FALSE
)

Arguments

x

R object such as array, file array proxy, or character that can be transformed into file array

...

additional headers to check used by filearray_checkload (see 'Details'). This argument is ignored by filearray_create, reserved for future compatibility.

filebase

a directory path to store arrays in the local file system. When creating an array, the path must not exist.

dimension

dimension of the array, at least length of 2

type

storage type of the array; default is 'double'. Other options include 'integer', 'logical', and 'raw'.

partition_size

positive partition size for the last margin, or NA to automatically guess; see 'Details'.

initialize

whether to initialize partition files; default is false for performance considerations. However, if the array is dense, it is recommended to set to true

mode

whether allows writing to the file; choices are 'readwrite' and 'readonly'.

symlink_ok

whether arrays with symbolic-link partitions can pass the test; this is usually used on bound arrays with symbolic-links; see filearray_bind;

on_missing

function to handle file array (such as initialization) when a new array is created; must take only one argument, the array object

verbose

whether to print out some debug messages

Details

The file arrays partition out-of-memory array objects and store them separately in local file systems. Since R stores matrices/arrays in column-major style, file array uses the slowest margin (the last margin) to slice the partitions. This helps to align the elements within the files with the corresponding memory order. An array with dimension 100x200x300x400 has 4 margins. The length of the last margin is 400, which is also the maximum number of potential partitions. The number of partitions are determined by the last margin size divided by partition_size. For example, if the partition size is 1, then there will be 400 partitions. If the partition size if 3, there will be 134 partitions. The default partition sizes are determined internally following these priorities:

1.

the file size of each partition does not exceed 1GB

2.

the number of partitions do not exceed 100

These two rules are not hard requirements. The goal is to reduce the numbers of partitions as much as possible.

The arguments ... in filearray_checkload should be named arguments that provide additional checks for the header information. The check will fail if at least one header is not identical. For example, if an array contains header key-signature pair, one can use filearray_checkload(..., key = signature) to validate the signature. Note the comparison will be rigid, meaning the storage type of the headers will be considered as well. If the signature stored in the array is an integer while provided is a double, then the check will result in failure.

Value

A FileArray-class instance.

Author(s)

Zhengjia Wang

Examples



# Prepare 
library(filearray)
filebase <- tempfile()
if(file.exists(filebase)){ unlink(filebase, TRUE) }

# create array
x <- filearray_create(filebase, dimension = c(200, 30, 8))
print(x)

# Assign values
x[] <- rnorm(48000)

# Subset
x[1,2,]

# load existing array
filearray_load(filebase)

x$set_header("signature", "tom")
filearray_checkload(filebase, signature = "tom")

## Not run: 
# Trying to load with wrong signature
filearray_checkload(filebase, signature = "jerry")

## End(Not run)


# check-load, and create a new array if fail
x <- filearray_load_or_create(
    filebase = filebase, dimension = c(200, 30, 8),
    verbose = TRUE, signature = "henry"
)
x$get_header("signature")

# check-load with initialization
x <- filearray_load_or_create(
    filebase = filebase, 
    dimension = c(3, 4, 5),
    verbose = TRUE, mode = "readonly",
    on_missing = function(array) {
        array[] <- seq_len(60)
    }
)

x[1:3,1,1]

# Clean up
unlink(filebase, recursive = TRUE)


filearray documentation built on July 9, 2023, 5:53 p.m.