knitr::opts_chunk$set(
  error = FALSE,
  collapse = TRUE,
  comment = "#>"
)
lang_output <- function(x, lang) {
  cat(c(sprintf("```%s", lang), x, "```"), sep = "\n")
}
cc_output <- function(x) lang_output(x, "cc")
r_output <- function(x) lang_output(x, "r")
plain_output <- function(x) lang_output(x, "plain")

The dust random number generators are suitable for use from other packages and we provide a few helpers in both R and C++ to make this easier.

For illustration purposes we will assume that you want to estimate pi using rejection sampling. The basic idea of this algorithm is simple; sample two U(0, 1) points x and y and test if they lie in the unit circle (i.e. sqrt(x^2 + x^2) < 1) giving the ratio of the area of a unit circle to a square. Multiplying the fraction of the points that we accept by 4 yields our estimate of pi.

Background using R's random number generator

First, an example that uses R's API for example (note that while the R API is C, we're using the cpp11 package interface here so that the following examples are similar):

cc_output(readLines("rng_pi_r.cpp"))

With cpp11 we can load this with cpp11::cpp_source

cpp11::cpp_source("rng_pi_r.cpp")

and then run it with

pi_r(1e6)

The key bits within the code above are that we:

Failure to run the GetRNGstate / PutRNGstate will result in the stream not behaving properly. This is explained in detail in the "Writing R Extensions" manual.

Basic implementation using dust

The implementation in dust will look very similar to above, but the way that we cope with the random number stream will look quite different. With the R version above we are looking after R's global stream (stored in the variable .Random.seed) and making sure that it is fetched and set on entry and exit to the function.

One of the design ideas in dust is that there is no single global source of random numbers, so we need to create a source that our function can use. If we were to use the simulation function multiple times we would want the stream to pick up where it left off last time, so the act of calling the function should update the seed as a "side effect".

The way we expose this for use within other packages is that the user (either package developer or user of the package) creates a "pointer" to some random number state. Passing that state into a C++ function will allow use of the random functions within dust, and will update the state correctly (see the following section for details).

First we create a pointer object:

rng <- dust:::dust_rng_pointer$new()
rng

Unlike the dust::dust_rng object there are no real useful methods on this object and from the R side we'll treat it as a black box. Importantly the rng object knows which algorithm it has been created to use

rng$algorithm

The default will be suitable for most purposes.

We can rewrite the pi approximation function as:

cc_output(readLines("rng_pi_dust.cpp"))

This snippet looks much the same as above:

cpp11::cpp_source("rng_pi_dust.cpp")

and then run it with

pi_dust(1e6, rng)

The C++ interface is described in more detail in the online documentation

Parallel implementation with dust and OpenMP

Part of the point of dust's random number generators is that they create independent streams of random numbers that can be safely used in parallel.

cc_output(readLines("rng_pi_parallel.cpp"))
cpp11::cpp_source("rng_pi_parallel.cpp")

Here we've made a number of decisions about how to split the problem up subject to a few constraints about using OpenMP together with R:

rng <- dust:::dust_rng_pointer$new(n_streams = 20)
pi_dust_parallel(1e6, rng, 4)

Unfortunately cpp11::cpp_source does not support using OpenMP so in the example above the code will run in serial and we can't see if parallelisation will help.

In order to compile with support, we need to build a little package and set up an appropriate Makevars file

path <- tempfile()
dir.create(path)
dir.create(file.path(path, "src"), FALSE, TRUE)

writeLines(
  c("Package: piparallel",
    "LinkingTo: cpp11, dust",
    "Version: 0.0.1"),
  file.path(path, "DESCRIPTION"))
writeLines(
  c('useDynLib("piparallel", .registration = TRUE)',
    'exportPattern("^[[:alpha:]]+")'),
  file.path(path, "NAMESPACE"))
writeLines(
  c("PKG_CXXFLAGS=-DHAVE_INLINE $(SHLIB_OPENMP_CXXFLAGS)",
    "PKG_LIBS=$(SHLIB_OPENMP_CXXFLAGS)"),
  file.path(path, "src", "Makevars"))
code <- grep("cpp11::linking_to", readLines("rng_pi_parallel.cpp"),
             invert = TRUE, value = TRUE)
writeLines(code, file.path(path, "src", "code.cpp"))

The package is fairly minimal:

withr::with_dir(path, fs::dir_tree())

We have an extremely minimal DESCRIPTION, which contains line LinkingTo: cpp11, dust from which R will arrange compiler flags to find both packages' headers:

plain_output(readLines(file.path(path, "DESCRIPTION")))

The NAMESPACE loads the dynamic library

plain_output(readLines(file.path(path, "NAMESPACE")))

The src/Makevars file contains important flags to pick up OpenMP support:

lang_output(readLines(file.path(path, "src/Makevars")), "make")

And src/code.cpp contains the file above but without the [[cpp11::linking_to(dust)]] line:

cc_output(readLines(file.path(path, "src/code.cpp")))

After compiling and installing the package, pi_dust_parallel will be available

pkgbuild::compile_dll(path, quiet = TRUE, debug = FALSE)
pkg <- pkgload::load_all(path, compile = FALSE, recompile = FALSE,
                         warn_conflicts = FALSE, export_all = FALSE,
                         helpers = FALSE, attach_testthat = FALSE,
                         quiet = TRUE)
pi_dust_parallel <- pkg$env$pi_dust_parallel

Now we have a parallel version we can see a speed-up as we add threads:

rng <- dust:::dust_rng_pointer$new(n_streams = 20)
bench::mark(
  pi_dust_parallel(1e6, rng, 1),
  pi_dust_parallel(1e6, rng, 2),
  pi_dust_parallel(1e6, rng, 3),
  pi_dust_parallel(1e6, rng, 4),
  check = FALSE)

More on the pointer object

This section aims to de-mystify the pointer objects a little. The dust random number state is a series of integers (by default 64-bit unsigned integers) that are updated each time a state is drawn (see vignette("rng.Rmd")). We expose this state to R as a vector of "raw" values (literally a series of bytes of data).

rng <- dust::dust_rng$new(seed = 1)
rng$state()

When numbers are drawn from the stream, the state is modified as a side-effect:

rng$random_real(20)
rng$state()

The same happens with our dust_rng_pointer objects used above:

ptr <- dust::dust_rng_pointer$new(seed = 1)
ptr$state()

Note that ptr starts with the same state here as rng did because we started from the same seed. When we draw 20 numbers from the stream (by drawing 10 pairs of numbers with our pi-estimation algorithm), we will advance the state

pi_dust(10, ptr)
ptr$state()

Note that the state here now matches the value returned by rng.

Normally nobody needs to know this - treat the pointer as an object that you pass to functions and ignore the details.



mrc-ide/dust documentation built on May 11, 2024, 1:08 p.m.