Ring Buffers

``` {r echo = FALSE, results = "hide"} knitr::opts_chunk$set( error = FALSE, fig.width = 7, fig.height = 5) set.seed(1)

This package implements [ring
buffers](https://en.wikipedia.org/wiki/Ring_buffer).  A ring
buffer can be used as a first-in-first-out (FIFO) buffer where the
maximum size is known ahead of time.  Because they do not grow in
size, they are useful to avoid using more and more memory as a
process runs.  Because the data reading and writing happens in an
(apparently) circular way, once data is added to the buffer it is
not copied (in contrast if you used a vector then every time data
is consumed you'd have to shuffle the vector around).

`ring` implements two different ring buffers that will likely suit
different applications.

* `ring_buffer_bytes` is implemented as a byte array in C, possibly
  with a "stride" indicating a set of bytes that go together.  Once
  the data reaches the end of the array we start writing to the
  beginning again.
* `ring_buffer_env` is implemented as a doubly linked list using
  R's environments.  This buffer can hold arbitrary R objects at
  each position.

The target audience for this package is either other package
developers who need a ring buffer in their package, or modellers
who have decided that a ring buffer is the right data structure to
help with their simulation model.

For all buffers, `head` will refer to the next bit of the buffer to
be written to, and `tail` will refer to the next bit of the buffer
to be read.  That is, elements are pushed onto the `head` of the
buffer and retrieved from the `tail`.  (There is no direct analogy
between these terms and the R functions `head` and `tail` which
operate on fixed-size vectors.)

# The environment buffer `ring_buffer_env`

This is the simplest buffer to understand because we don't have to
deal with raw vectors.

To create a buffer that can hold up to 100 elements, use the
`ring_buffer_env` function:
``` {r }
buf <- ring::ring_buffer_env(100)

This is an R6 class, with several methods:

buf

Operations on the class happen by running methods using $. So the size of the buffer:

buf$size()

...the number of elements free and used:

buf$free()
buf$used()

...whether the buffer is empty or full:

buf$is_empty()
buf$is_full()

To start using the buffer we need to put some data in it. There are two main functions for adding data:

buf$set(data, n) sets n elements to be the value data
buf$push(data, iterate) pushes data into the buffer, with the iterate argument indicating if we should iterate over data or treat it as a single element

So to set the first 5 elements to be "a", "b", ..., "e", use:

buf$push(letters[1:5])

The buffer is no longer empty

buf$is_empty()

...having 5 elements:

buf$used()

...and room for 95 more:

buf$free()

To read the content of the buffer without modifying it, use read(n) where n is the number of elements to read. This always returns a list of length n:

buf$read(1)
buf$read(2)

If you try to read too far, then the buffer will underflow and you will get an error: ``` {r error = TRUE, purl = FALSE} buf$read(20)

If you just want the the first element, use `tail()`
``` {r }
buf$tail()

The tail returns the first element in (so the buffer naturally operates as a first-in-first-out queue).

You can also read the most recently added element with head()

buf$head()

And you can offset these by an integer number of steps. So moving one position into the buffer from the tail gets the second element added:

buf$tail_offset(1)

or moving three elements into the buffer from the head (most recently added element) gets the same bit of data

buf$head_offset(3)

The above operations are all nondestructive -- they leave the buffer unchanged. To consume elements, use take(n) which operates the same way as read but it also moves the buffer tail; it consumes elements leaving space for more.

buf$free()
buf$take(1)
buf$free()

Now we have consumed an element the tail has moved along, so tail contains "b" and "a" is removed from the buffer:

buf$tail()

To reset the buffer, use reset(). This empties the buffer of all data:

buf$reset()
buf$used()
buf$is_empty()

While the ring buffer is fixed in size in typical use, you can grow it explicitly. To add additional space, use the grow method:

buf$size()
buf$grow(20)
buf$size()

Application: simulation with recent history

The whole point of the ring buffer though is that we can push things onto it and pull the most recent out, even when the number of things pushed overall is larger than the buffer size.

So imagine a simulation where we need to keep track of the last 5 steps. The simulation is a random walk.

step <- function(x) {
  if (runif(1) < 0.5) x - 1L else x + 1L
}

x <- 0L
buf <- ring::ring_buffer_env(5)
h <- integer(20)
buf$push(x)
h[1L] <- x

set.seed(1)
for (i in seq_len(length(h) - 1L)) {
  x <- step(x)
  buf$push(x)
  h[i + 1L] <- x
}

The whole history:

The last 5 steps:

unlist(buf$read(5))

So we could rewrite the simulation so that the random walk tends up if the last few steps have been increases and tends down if the last few steps have been decreases:

step <- function(x) {
  if (length(x) > 1) {
    p <- mean(diff(x)) / 2 + 0.5
  } else {
    p <- 0.5
  }
  if (runif(1) < p) x[length(x)] - 1L else x[length(x)] + 1L
}

x <- 0L
buf <- ring::ring_buffer_env(5)
h <- integer(100)
buf$push(x)
h[1L] <- x

set.seed(1)
for (i in seq_len(length(h) - 1L)) {
  x <- step(unlist(buf$read(buf$used())))
  buf$push(x)
  h[i + 1L] <- x
}

Now we have a simulation with a strong mean reverting tendency:

par(mar = c(4, 4, .5, .5))
plot(h, type = "l", xlab = "step", ylab = "y", las = 1)

Because the buffer always holds the last 5 (or fewer) elements the book-keeping involved in working with the last few elements out is simplified. Ignoring the fact that we hold the entire history in the fixed size vector h, only the last few elements need to be retained which may be useful if the simulation generates a lot of data.

A downside of this implementation is that buf$read() returns a list that must be turned into a vector with unlist, even though we know in this case that the simulation will always produce an integer vector. The ring buffers described below can help with that problem.

The bytes buffer `ring_buffer_bytes`

This is the classical implementation of a ring buffer, and the implementation is broadly based on the one here, by \@dhess.

This operates basically the same way as ring_buffer_env, and presents a very similar interface to R, but with a few key differences:

The contents of the buffer are raw bytes (using R's raw vectors). These are a bit fiddly to work with but can be very powerful.
The iterate distinction of push disappears because there is no ambiguity with R objects

To construct a buffer of 1000 bytes:

buf <- ring::ring_buffer_bytes(1000)

Most of the same methods apply directly:

buf$free()
buf$used()
buf$is_full()
buf$is_empty()

Generate a byte sequence:

bytes <- as.raw(0:255)

...and push them into the buffer:

buf$push(bytes)

...read from the buffer

buf$read(10)

...destructively take the oldest elements

buf$used()
buf$take(20)
buf$used()

Striding

Single bytes can hold only values 0 to 255 (or character equivalents, such as a becomes r charToRaw("a") via charToRaw("a"). But if you want to hold a full integer, that (usually) takes 4 bytes, a double (usually) takes 8.

To allow this, a bytes buffer can be "strided"; this indicates the number of consecutive bytes that should together make up one logical entry. The buffer then contains size of these. So to create a buffer of 100 entries, each of 8 bytes you could do:

buf <- ring::ring_buffer_bytes(100, 8)

Each element pushed onto the buffer must have the correct size. So to push the byte sequence 1..8 onto the buffer:

buf$push(as.raw(1:8))

but if you pushed more or less it would be an error: ``` {r error = TRUE, purl = FALSE} buf$push(as.raw(1:4))

Reading happens in *logical* units, not bytes:
``` {r }
buf$read(1)

and you can get the number of elements used:

buf$used()

or the number of bytes

buf$used(bytes = TRUE)

The typed bytes buffer `ring_buffer_bytes_typed`

If 8 bytes is a double, it should be possible to make a bytes buffer that holds one (or more) doubles per entry. That is what the ring_buffer_bytes_typed buffer does, with a few corner cases dealt with. To use, you decide what the R interpretation of an entry is, it will determine the size per entry and appropriate encoding and decoding functions and you can ignore that it is storing bytes. For performance reasons this does not use R's serialisation and simply copies the data stored in vectors.

For example, to make a buffer of 10 elements, each of which is a single real number (double), use:

buf <- ring::ring_buffer_bytes_typed(10, double(1))

onto which real numbers can be pushed:

buf$push(pi)

And retrieve the data.

buf$take(1)

Entries can contain more than one number; to make a buffer of length 10, each element of which is a vector of 5 doubles:

buf <- ring::ring_buffer_bytes_typed(10, double(5))
buf$push(rnorm(5))
buf$read(1)

Because this is just implemented as a byte array, we can just push a bunch of numbers straight into the buffer:

buf$push(rnorm(5 * 10))

With elements in the buffer, we can request them. The integer argument of take indicates the number of groups of 5 doubles we would like back:

buf$take(1)

If you try to take more than is in the buffer it is an error: ``` {r error = TRUE, purl = FALSE} buf$take(10)

## The translating bytes buffer `ring_buffer_bytes_translate`

The `ring_buffer_bytes_typed` function is implemented by
translating R objects to bytes (when storing with `$set()`,
`$push()`, etc). and from bytes back to R objects (when retrieving
with `$read()`, `$take()`, etc).  `ring_buffer_bytes_translate`
exposes this interface.

The "typed" buffers do not allow storing strings because they can
be any number of bytes long (the bytes buffers require a fixed
"stride" within a buffer).  But we can store fixed length strings.

To convert a string to a byte sequence, use `charToRaw` (or
`as.raw(utf8ToInt(x))`, but then multi-byte sequences might start
being difficult).
``` {r }
(bytes <- charToRaw("hello world"))

The inverse transformation is rawToChar (or intToUtf8(as.integer(x))):

rawToChar(bytes)

The function ring_buffer_bytes_translate takes these functions as its 3rd and fourth arguments. So to make a buffer that will hold up to 100 strings, each of 8 bytes:

b <- ring::ring_buffer_bytes_translate(100, 8, charToRaw, rawToChar)

We can now store 8 character strings:

b$push("abcdefgh")
b$tail()

But other length strings cannot be added: ``` {r error = TRUE, purl = FALSE} b$push("hello!")

Probably this would be most useful storing just single characters
as then it would make a buffer of *text*.

# The C API

The `ring` package can be used in other R packages using the
`LinkingTo` mechanism.  To do so:

* In your `DESCRIPTION`, add a line `LinkingTo: ring` (you do not
  need to include `ring` in `Depends` or `Imports` as we need it
  only for the package build).

* In your `src/` directory, add a file `ring.c` containing just the
  line `#include <ring/ring.c>` (but see the note in the
  documentation for `ring_buffer_create` below).

* Anywhere in your code you want to use the ring buffer, include
  the line `#include <dde/dde.h>` to include the prototypes and use
  the interface as described below.

(I am not sure what the best practice way of doing this with a
standalone shared library compiled with `R CMD SHLIB` is though;
probably best to make a package.)

The C API is documented only in the header file, and it should be
fairly straightforward to use (with reference to the docs above;
this is the code underlying the `ring_buffer_bytes` interface).

``` {r echo = FALSE, results = "asis"}
writeLines(c("```c",
             readLines(system.file("include/ring/ring.h", package = "ring")),
             "```"))

For a complete real-world example of use, see dde, which uses a ring buffer to hold the history of a set of differential equations, and uses that to implement delay equations. Here, the ring buffer means that the memory requirements don't grow with the length of running the simulation (as it only cares about fairly recent history, the natural overflow from the ring buffer is well suited). The memory is only allocated at the beginning of the simulation so there is no additional memory allocations. And because ring returns (const) pointers to the appropriate place in memory there is little copying.

A simple application that implements the same mean-reverting simulation from above:

{r echo = FALSE, results = "asis"} writeLines(c("c", readLines(system.file("examples/example.c", package = "ring")), "```"))

## A nontrivial example

In the [`dde`](https://github.com/mrc-ide/dde) package (not yet on
CRAN), I use ring buffers to solve delay differential equations
(DDEs).  To solve these, we need to know the state of the system at
a series of points in the past.  So at every time step we push the
state of the system onto a ring buffer.  Then, as the solver moves
forward in time we can get the system at some previous point in
time by looking back through the ring buffer until the time in
question is found.

In this application a ring buffer is the ideal data structure
because we often want to solve equations where the time we look
back is a small fraction of the total time.  Without a ring buffer
we'd either have to store the _entire_ history (with a large memory
cost, most of which is not needed) or periodically copy the history
around.

To use `ring` within the `dde` package:

* In the `DESCRIPTION` we [declare a link to
  `ring`](https://github.com/mrc-ide/dde/blob/7ebaefd/DESCRIPTION#L14)
  using the `LinkingTo:` field.

* In the `src` directory, [the contents of `<ring/ring.c>` are
  included](https://github.com/mrc-ide/dde/blob/7ebaefd/src/ring.c);
  this is possible because of the `LinkingTo` field.  This file now
  includes all the actual ring buffer implementation.

* In
  [src/dopri.h](https://github.com/mrc-ide/dde/blob/7ebaefd/src/dopri.h#L8)
  we include `<ring/ring.h>` which allows the ring buffer code to be used
  in any file that includes `dopri.h`.  There is a [data structure
  in this
  header](https://github.com/mrc-ide/dde/blob/7ebaefd/src/dopri.h#L77-L111)
  that includes within itself a ring buffer to hold the history.

* In
  [src/dopri.c](https://github.com/mrc-ide/dde/blob/7ebaefd/src/dopri.c)
  the ring buffer code is actually used:

    - [initialisation](https://github.com/mrc-ide/dde/blob/7ebaefd/src/dopri.c#L48-L50)
    - [a time is found within the ring buffer](https://github.com/mrc-ide/dde/blob/7ebaefd/src/dopri.c#L694-L719)
    - [the ring buffer is advanced](https://github.com/mrc-ide/dde/blob/7ebaefd/src/dopri.c#L417)
    - [the data is freed](https://github.com/mrc-ide/dde/blob/7ebaefd/src/dopri.c#L220)

* In
  [src/dopri_5.c](https://github.com/mrc-ide/dde/blob/7ebaefd/src/dopri_5.c#L109-L119)
  new data is written to the head of the ring buffer, being the
  state of the system at the end of the step.  The history head is
  treated as a big block of contiguous doubles.

Used this way, the programmer can focus on simply writing to the
application and do as little work on bookkeeping as possible.

# The C++ API

If you're using C++ you may find the [Boost circular
buffer](https://www.boost.org/doc/libs/1_61_0/doc/html/circular_buffer.html)
is likely to be far better; you can use this by `LinkingTo:` the
`BH` package and using `#include <boost/circular_buffer.hpp>` in
your code.

Alternatively, the `ring` C code can be directly used in C++
as above.  Or, there is a class-based approach available:

* In your `src/` directory, add a file `ring.cpp` containing just the
  line `#include <ring/ring.cpp>`

* Anywhere in your code you want to use the ring buffer, include
  the line `#include <dde/dde.hpp>` to include the class definition:

``` {r echo = FALSE, results = "asis"}
writeLines(c("```cpp",
             readLines(system.file("include/ring/ring.hpp", package = "ring")),
             "```"))