seed for RNG | R Documentation |
These functions control the parallel-capable L'Ecuyer-CMRG pseudo-random number generator (RNG) on clusters and in multicore parallel applications for reproducible results. Reproducibility is possible across different node and core configurations by associating the RNG streams with an application vector.
comm.set.seed(
seed = NULL,
diff = TRUE,
state = NULL,
streams = NULL,
comm = .pbd_env$SPMD.CT$comm
)
comm.set.stream(
name = NULL,
reset = FALSE,
state = NULL,
comm = .pbd_env$SPMD.CT$comm
)
comm.get.streams(
comm = .pbd_env$SPMD.CT$comm,
seed = FALSE
)
seed |
In |
diff |
Logical indicating if the parallel instances should have different random streams. |
state |
In function |
streams |
An vector of sequential integers specifying the streams to be prepared on the current rank. Typically, this is used by 'comm.chunk()' to prepare correct streams for each rank, which are aligned with the vector being chunk-ed. |
name |
Stream number that is coercible to character, indicating to start or continue generating from that stream. |
reset |
If true, reset the requested stream back to its beginning. |
comm |
The communicator that determines MPI rank numbers. |
This implementation uses the function nextRNGStream
in package parallel
to set up streams appropriate for working on a
cluster system with MPI. The main difference from parallel
is
that it adds a reproducibility capability with vector-based
streams that works across different numbers of nodes or cores by associating
streams with an application vector.
Vector-based streams are best set up with the higher level function
comm.chunk
instead of using comm.set.stream
directly.
comm.chunk
will set up only the streams that each rank needs
and provides the stream numbers necessary to switch between them with
comm.set.stream
.
The function uses parallel
's
nextRNGStream()
and sets up the parallel stream seeds in the
.pbd_env$RNG
environment, which are then managed with
comm.set.stream
. There is only one communication broadcast in
this implementation that ensures all ranks have the same seed as rank 0.
Subsequently, each rank maintains only its own streams.
When rank-based streams are set up, comm.chunk
with
form = "number"
and rng = TRUE
parameters, streams are
different for each rank and switching is not needed. Vector-based streams
are obtained with form = "vector"
and rng = TRUE
parameters.
In this latter case, the vector returned to each
rank contains the stream numbers (and vector components) that the rank owns.
Switch with comm.set.stream(v)
, where v is one of the stream numbers.
Switching back and forth is allowed, with each stream continuing where it
left off.
## RNG Notes
R sessions connected by MPI begin like other R sessions as discussed in
Random
. On first use of random number generation,
each rank computes its own seed from a combination of clock time and process
id (unless it reads a previously saved workspace, which is not recommended).
Because of asynchronous execution, imperfectly synchronized node clocks,
and likely different process ids, this
almost guarantees unique seeds and most likely results in independent
streams. However, this is not reproducible and not guaranteed. Both
reproducibility and guarantee are brought by the use of the L'Ecuyer-CMRG
generator implementation in nextRNGStream
and the
use of comm.set.seed
and comm.set.stream
adaptation for
parallel computing on cluster systems.
At a high level, the L'Ecuyer-CMRG pseudo-random number generator can
take jumps (advance the seed) in its
stream (about 2^191 long) so that distant substreams can be assigned. The
nextRNGStream
implementation takes jumps of 2^127
(about 1.7e38) to provide up to 2^64 (about 1.8e19) independent streams. See
https://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf
for more details.
In situations that require the same stream on all ranks, a simple
set.seed
from base R and the default RNG will suffice.
comm.set.seed
will also accomplish this with the diff = FALSE
parameter if switching between same and different streams is needed.
comm.set.seed
engages the L'Ecuyer-CMRG RNG and invisibly returns
the previous RNG in use (Output of RNGkind()[1]). Capturing it, enables
the restoration of the previous RNG with RNGkind
. See examples
of use in demo/seed_rank.r
and demo/seed_vec.r
.
comm.set.stream
invisibly returns the current stream number as character.
comm.get.streams
returns the current stream name and other stream names available to the rank as a character string. Optionally, the local
.Random.seed
is included. This function is a debugging aid for
distributed random streams.
All three functions manage and use the environment .pbd_env$RNG
.
Wei-Chen Chen wccsnow@gmail.com, George Ostrouchov, Drew Schmidt, Pragneshkumar Patel, and Hao Yu.
Pierre L'Ecuyer, Simard, R., Chen, E.J., and Kelton, W.D. (2002) An Object-Oriented Random-Number Package with Many Long Streams and Substreams. Operations Research, 50(6), 1073-1075.
https://www.iro.umontreal.ca/~lecuyer/myftp/papers/streams00.pdf
Programming with Big Data in R Website: https://pbdr.org/
comm.chunk()
## Not run:
### Save code in a file "demo.r" and run with 2 processors by
### SHELL> mpiexec -np 2 Rscript demo.r
spmd.code <- "
suppressMessages(library(pbdMPI, quietly = TRUE))
comm.print(RNGkind())
comm.print(runif(5), all.rank = TRUE)
set.seed(1357)
comm.print(runif(5), all.rank = TRUE)
old.kind = comm.set.seed(1357)
comm.print(RNGkind())
comm.print(runif(5), all.rank = TRUE)
comm.set.stream(reset = TRUE)
comm.print(runif(5), all.rank = TRUE)
comm.set.seed(1357, diff = TRUE)
comm.print(runif(5), all.rank = TRUE)
state <- comm.set.stream() ### save each rank's stream state
comm.print(runif(5), all.rank = TRUE)
comm.set.stream(state = state) ### set current RNG to state
comm.print(runif(5), all.rank = TRUE)
RNGkind(old.kind)
set.seed(1357)
comm.print(RNGkind())
comm.print(runif(5), all.rank = TRUE)
### Finish.
finalize()
"
# execmpi(spmd.code, nranks = 2L)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.