knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Package ufomatrices is an example implementation of R matrices using User Fault Objects (UFOs) from the ufos package. UFOs spoof vectors by constructing SEXP structures that look exactly like ordinary R vectors to the R interpreter, but trigger the UFO framework when their memory is accessed. The framework then loads appropriate data from an arbitrary source (eg. a binary file on a harddrive) into memory and gives access to it to the R interpreter. The data will also be "forgotten" and the memory freed if a vector threatens to overrun memory.

Let's examine a short example to see all this in action. First, let's load the package.

library(ufovectors)

Note, that the package loads ufos as a dependency.

UFO matrices

The ufomatrices package provides constructors for various types of matrices:

Each of these functions requires a path to a binary file that will provide the data that populates the matrix. Internally, an R matrix is really just an R vector with the additional dims attribute specifying the dimensions of the matrix, and a class attribute of "matrix". The data of the matrix will be provided through a binary file. This file just contains one-dimensional data, so we need to add dimensions to it, which we do via the cols and rows arguments.

Our example binary file at path example_int.bin 2^16 32-bit consecutive little-endian-encoded values.

00 00 00 00  01 00 00 00  02 00 00 00  03 00 00 00  
04 00 00 00  05 00 00 00  06 00 00 00  07 00 00 00  
08 00 00 00  09 00 00 00  0A 00 00 00  0B 00 00 00  
0C 00 00 00  0D 00 00 00  0E 00 00 00  0F 00 00 00  
...          ...          ...          ...
FC FF 00 00  FD FF 00 00  FE FF 00 00  FF FF 00 00

This represents the values:

    0     1     2     3  
    4     5     6     7 
    8     9    10    11 
   12    13    14    15
   16    17    18    19
  ...   ...   ...   ...
65532 65533 65534 65535

We can use this file to represent a number of matrices. We can represent a matrix with 256 columns and 256 rows or a matrix representing the following matrix:

             col [,1]  col [,2]   ...   col [,256]
row [1,]     0         256        ...   65280
row [2,]     1         257        ...   65281
...          ...       ...        ...   ...
row [256,]   255       511        ...   65535

We can also represent a matrix with 16 columns and 4096 rows:

             col [,1]  col [,2]   ...   col [,16]
row [1,]     0         4096       ...   61440
row [2,]     1         4097       ...   61441
...          ...       ...        ...   ...
row [4096,]  4095      8191       ...   65535

These both use all the elemnts in the file. We can also create a matrix that only uses a prefix of the file, for instamce 16 columns and 256 rows:

             col [,1]  col [,2]  ...   col [,16]
row [1,]     0         256       ...   3840
row [2,]     1         257       ...   3841
...          ...       ...       ...   ...
row [256,]   255       511       ...   4095

Let's actually create these matrices.

m1 <- ufo_matrix_integer_bin("example_int.bin", 256, 256)
m2 <- ufo_matrix_integer_bin("example_int.bin", 4096, 16)
m3 <- ufo_matrix_integer_bin("example_int.bin", 256, 16)

Sine in R matrices are implemented as vectors, when we execute these functions the R interpreter asks the UF engine to allocate some memory using a custom allocator that will be used to store a vector. Subsequently, dimension information is attached to the vector via attributes.

Before we do anything, let's turn on debug mode to see what happens under the hood.

ufo_set_debug_mode(T)

Now, let's try accessing some elements of an array.

m1[1,1]

Once we access an element, the UF engine prepares a region of actual memory and asks its source to populate it. Since the source is a binary file, a chunk of the file is read into memory. We see exactly which chunk of the file is loaded into memory in the debug message. The size of the chunk depends on the configuration of the UF engine, but it's at least a page fo memory.

If we access some more elements again, this data is actually in memory and no more loading takes place.

m1[2,16]

If we access elements outside of the loaded chunk, the source will be asked to provide another chunk.

m1[256,256]

We see again through the debug message that another chunk was loaded into memory.

Manually constructing UFO matrices from UFO vectors

R matrices are just R vectors with additional arguments. UFO matrices therefore are just UFO vectors with additional arguments. While the ufovectors package provides constructors to construct these matrices, you can just add the necessary attributes to an existing UFO vector.

Specifically, an R matrix is a vector with the following attributes: - a class attribute saying it's a matrix, - a dim attribute specifying the dimensions of the matrix.

In addition, when setting a dim attribute on a vector, a class attribute is automatically added. Hence, the simplest way of creating a UFO array is to assign these manually:

{r manual-matrix} vec <- ufo_integer_bin("example_int.bin") dim(vec) <- list(length(vec)/2, 2) class(vec)

We have created a matrix!



ufo-org/ufo-r-vectors documentation built on Oct. 2, 2022, 11:09 p.m.