lazyarray: Create or load 'lazyarray' instance

Description Usage Arguments Details Author(s) See Also Examples

View source: R/lazyarray.R

Description

If path is missing, create a new array. If path exists and meta file is complete, load existing file, otherwise create new meta file and import from existing data.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
lazyarray(
  path,
  storage_format,
  dim,
  dimnames = NULL,
  multipart = TRUE,
  prefix = "",
  multipart_mode = 1,
  compress_level = 50L,
  file_names = list("", seq_len(dim[[length(dim)]]))[[multipart + 1]],
  meta_name = "lazyarray.meta",
  read_only = FALSE,
  quiet = FALSE,
  ...
)

Arguments

path

path to a local drive where array data is stored

storage_format

data type, choices are "double", "integer", "character", and "complex"; see details

dim

integer vector, dimension of array, see dim

dimnames

list of vectors, names of each dimension, see dimnames

multipart

whether to split array into multiple partitions, default is true

prefix

character prefix of array partition

multipart_mode

1, or 2, mode of partition, see create_lazyarray

compress_level

0 to 100, level of compression. 0 means no compression, 100 means maximum compression. For persistent data, it's recommended to set 100. Default is 50.

file_names

partition names without prefix nor extension; see details

meta_name

header file name, default is "lazyarray.meta"

read_only

whether created array is read-only

quiet

whether to suppress messages, default is false

...

ignored

Details

There are three cases and lazyarray behaves differently under each cases. Case 1: if path is missing, then the function calls create_lazyarray to create a blank array instance. Case 2: if path exists and it contains meta_name, then load existing instance with given read/write access. In this case, parameters other than read_only, path, meta_name will be ignored. Case 3: if meta_name is missing and path is missing, then lazyarray will try to create arrays from existing data files.

If lazyarray enters case 3, then file_names will be used to locate partition files. Under multi-part mode (multipart=TRUE), file_names is default to 1, 2, ..., dim[length(dim)]. These correspond to '1.fst', '2.fst', etc. under path folder. You may specify your own file_names if irregular names are used. and file format for each partition will be <prefix><file_name>.fst. For example, a file name file_names=c('A', 'B') and prefix="file-" means the first partition will be stored as "file-A.fst", and "file-B.fst". It's fine if some files are missing, the corresponding partition will be filled with NA when trying to obtain values from those partition. However, length of file_names must equals to the last dimension when multipart=TRUE. If multipart=FALSE, file_names should have length 1 and the corresponding file is the data file.

It's worth note to import from existing partition files generated by other packages such as 'fst', the partition files must be homogeneous, meaning the stored data length, dimension, and storage type must be the same. Because 'fstcore' package stores data in data frame internally, the column name must be 'V1', 'V2', etc. for non-complex elements or 'V1R', 'V1I', ... for complex numbers (real and imaginary data are stored in different columns).

Author(s)

Zhengjia Wang

See Also

create_lazyarray, load_lazyarray

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
path <- tempfile()

# ---------------- case 1: Create new array ------------------
arr <- lazyarray(path, storage_format = 'double', dim = c(2,3,4), 
                 meta_name = 'lazyarray.meta')
arr[] <- 1:24

# Subset and get the first partition
arr[,,1]

# Partition file path (total 4 partitions)
arr$get_partition_fpath()

# Removing array doesn't clear the data
rm(arr); gc()

# ---------------- Case 2: Load from existing directory ----------------
## Important!!! Run case 1 first
# Load from existing path, no need to specify other params
arr <- lazyarray(path, meta_name = 'lazyarray.meta', read_only = TRUE)

arr[,,1]

# ---------------- Case 3: Import from existing data ----------------
## Important!!! Run case 1 first

# path exists, but meta is missing, all other params are required
# Notice the partition count increased from 4 to 5, and storage type converts
# from double to character
arr <- lazyarray(path = path, meta_name = 'lazyarray-character.meta', 
                 file_names = c(1,2,3,4,'additional'), 
                 storage_format = 'character', dim = c(2,3,5), 
                 quiet = TRUE, read_only = FALSE)

# partition names
arr$get_partition_fpath(1:4, full_path = FALSE)
arr$get_partition_fpath(5, full_path = FALSE)

# The first dimension still exist and valid
arr[,,1]

# The additional partition is all NA
arr[,,5]

# Set data to 5th partition
arr[,,5] <- rep(0, 6)

# -------- Advanced usage: create fst data and import manually --------

# Clear existing files
path <- tempfile()
unlink(path, recursive = TRUE)
dir.create(path, recursive = TRUE)

# Create array of dimension 2x3x4, but 3rd partition is missing
# without using lazyarray package 

# Column names must be V1 or V1R, V1I (complex)
fst::write_fst(data.frame(V1 = 1:6), path = file.path(path, 'part-1.fst'))
fst::write_fst(data.frame(V1 = 7:12), path = file.path(path, 'part-B.fst'))
fst::write_fst(data.frame(V1 = 19:24), path = file.path(path, 'part-d.fst'))

# Import via lazyarray
arr <- lazyarray(path, meta_name = 'test-int.meta',
                 storage_format = 'integer',
                 dim = c(2,3,4), prefix = 'part-', 
                 file_names = c('1', 'B', 'C', 'd'), 
                 quiet = TRUE)

arr[]

# Complex case
fst::write_fst(data.frame(V1R = 1:6, V1I = 1:6), 
               path = file.path(path, 'cplx-1.fst'))
fst::write_fst(data.frame(V1R = 7:12, V1I = 100:105), 
               path = file.path(path, 'cplx-2.fst'))
fst::write_fst(data.frame(V1R = 19:24, V1I = rep(0,6)), 
               path = file.path(path, 'cplx-4.fst'))
arr <- lazyarray(path, meta_name = 'test-cplx.meta',
                 storage_format = 'complex',
                 dim = c(2,3,4), prefix = 'cplx-', 
                 file_names = 1:4, quiet = TRUE)

arr[]

lazyarray documentation built on July 18, 2020, 9:06 a.m.