knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Package ufovectors is an example implementation of R vectors using User Fault Objects (UFOs) from the ufos package. UFOs allow our R vectors to be lazily loaded into memory on demand from an arbitrary source. Package ufovectors provides a sample implementation of such a source in the form of binary files. The data from these binary files will be loaded into memory on demand as it is used. The data will also be "forgotten" and the memory freed if a vector threatens to overrun memory.
Let's examine a short example to see all this in action. First, let's load the package.
library(ufovectors)
Note, that the package loads ufos as a dependency.
Before we do anything, let's turn on debug mode to see what happens under the hood.
ufo_set_debug_mode(T)
The ufovectors package provides constructors for various types of vectors:
ufo_integer_bin (path)
ufo_numeric_bin (path)
ufo_logical_bin (path)
ufo_complex_bin (path)
ufo_raw_bin (path)
Each of these functions requires a path to a binary file. Our example binary file at path example_int.bin
contains 2^16 32-bit consecutive little-endian-encoded values, ie:
00 00 00 00 01 00 00 00 02 00 00 00 03 00 00 00 04 00 00 00 05 00 00 00 06 00 00 00 07 00 00 00 08 00 00 00 09 00 00 00 0A 00 00 00 0B 00 00 00 0C 00 00 00 0D 00 00 00 0E 00 00 00 0F 00 00 00 ... ... ... ... FC FF 00 00 FD FF 00 00 FE FF 00 00 FF FF 00 00
We can use this file as a source for an integer vector containing the values 0:65535
:
iv <- ufo_integer_bin("example_int.bin")
When we execute this function the R interpreter asks the UF engine to allocate some memory using a custom allocator that will be used to store a vector. However, instead of allocating any real memory for this vector, UF engine allocates some virtual memory for it, thus rendering it a UFO. Whenever that memory is accesed, the operating system passes on a request to the UF system to allocate and populate some real memory. At this time, since we did not ask for any data from iv
, the vector does not load any memory.
Now, let's try accessing an element of the vector.
# this is just a hack to capture stderr and show it in a vignette, pt. 1 stdout <- capture.output(iv[4], type = "message")
iv[4]
# this is just a hack to capture stderr and show it in a vignette, pt. 2 cat(paste0(stdout, sep="\n"))
Once we access an element, the UF engine prepares a region of actual memory and asks its source to populate it. Since the source is a binary file, a chunk of the file is read into memory. We see exactly which chunk of the file is loaded into memory in the debug message. The size of the chunk depends on the UF engine, but it's at least a page fo memory.
If we access some more elements again, this data is actually in memory and no more loading takes place.
iv[4] iv[5]
If we access elements outside of the loaded chunk, the source will be asked to provide another chunk.
# this is just a hack to capture stderr and show it in a vignette, pt. 1 stdout <- capture.output(iv[10000], type = "message")
iv[10000]
# this is just a hack to capture stderr and show it in a vignette, pt. 2 cat(paste0(stdout, sep="\n"))
We see again through the debug message that another chunk was loaded into memory.
UFOs attempt to be feature complete and as transparent as possible. A typical use of vectors involves setting and getting values form them as well as performing vectorized operations:
v <- iv + 1
This will work, but R will create v
as an ordinary vector, not a UFO. This means that for larger-than-memory vectors, this operation will fail.
To prevent this, they provide a set of operators that return file-backed UFOs. These
library(ufooperators) iv <- ufo_integer_bin("example_int.bin")
First of all, this adds a class attribute to all UFO vectors:
class(iv)
The now class-conscious UFO implements a set of useful operators that return larger-than-memory--capable UFOs via the S3 object system.
v <- iv + 1 class(v)
v <- iv + 42 v <- iv - 42 v <- iv * 42 v <- iv / 42 v <- iv ^ 42 v <- iv %% 42 v <- iv %/% 42 v <- iv < 42 v <- iv > 42 v <- iv >= 42 v <- iv <= 42 v <- iv == 42 v <- iv != 42 v <- !iv v <- iv & TRUE v <- iv | TRUE #v <- iv[1:10] #v <- iv[1:10] <- 42
If adding a class to the UFO breaks the transparency too much, UFOs can also be made to override default operators instead of being plugged into the S3 system.
options(ufos.overload_operators = TRUE)
While we attempt to mitigate any potential problems, this approach is more dangerous than the S3 approach, since the override will apply to all operators and may lead to unexpected problems down the line for other objects.
UFOs also expose an API that replicates the capabilities of the operators as ordinary R functions.
v <- ufo_add(iv, 42) v <- ufo_subtract(iv, 42) v <- ufo_multiply(iv, 42) v <- ufo_divide(iv, 42) v <- ufo_power(iv, 42) v <- ufo_modulo(iv, 42) v <- ufo_int_divide(iv, 42) v <- ufo_less(iv, 42) v <- ufo_less_equal(iv, 42) v <- ufo_greater(iv, 42) v <- ufo_greater_equal(iv, 42) v <- ufo_equal(iv, 42) v <- ufo_unequal(iv, 42) v <- ufo_not(iv) v <- ufo_or(iv, TRUE) v <- ufo_and(iv, TRUE) #v <- ufo_subset(iv, 1:10) #v <- ufo_subset_assign(iv, 1:10, 42)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.