RandomArraySeed-class: A DelayedArray seed supplying chunked random values

RandomArraySeed-classR Documentation

A DelayedArray seed supplying chunked random values

Description

The RandomArraySeed is a DelayedArray seed that performs reproducible, on-demand sampling of randomly distributed values. Note that this is a virtual class; the intention is to define concrete subclasses corresponding to specific parameterized distributions.

Chunking dimensions

The array is conceptually partitioned into contiguous chunks of the same shape. The random values in each chunk are initialized with a different seed and stream via the PCG32 pseudo-random number generator (see the dqrng package for details). This design allows us to rapidly access any given subarray without having to do jump-aheads from the start of the stream.

The default chunking dimensions are set to the square root of the array dimensions - or 100, whichever is larger. This scheme provides decent though suboptimal performance along any dimension. If the access pattern is known beforehand, a better chunking scheme can often be chosen and passed to the chunkdim argument.

Note that changing the chunking dimensions will change the ordering of array values, even if the seeds are unchanged. This may be unexpected, given that chunking in real datasets will never change the data, only the performance of access operations. However, it is largely unavoidable in this context as the random number stream is rearranged within the array.

The chunkdim(x) method will return the chunk dimensions of a RandomArraySeed instance x. This will be used by the DelayedArray machinery to optimize block processing by extracting whole chunks where possible.

Implementing subclasses

To sample from a specific distribution, we can implement a concrete subclass of the RandomArraySeed. This is done by implementing methods for sampleDistrFun and sampleDistrParam.

In the code chunks below, x is an instance of a RandomArraySeed subclass:

  • sampleDistrFun(x) returns a quantile function that accepts a vector of cumulative probabilities p and returns a numeric vector of quantiles. A typical example is qnorm, though similar functions from the stats package can also be used. The output vector should be the same length as p; any other distributional parameters should be recycled to the length of p.

  • sampleDistrParam(x) returns a character vector specifying the names of the distributional parameters as slots of x. For example, for a subclass that samples from a normal distribution, this might be "mean" and "sd". Each distributional parameter is expected to be numeric.

The extract_array method for the RandomArraySeed will automatically use both of the above methods to sample from the specified distribution. This is achieved by randomly sampling from a standard uniform distribution, treating the values as probabilities and converting them into quantiles.

Distributional parameters

Distributional parameters are passed to the relevant quantile function to obtain a random value from the desired distribution. Each parameter can be:

  • A numeric scalar, which is used throughout the array.

  • A numeric vector, which is recycled along the length of the array. This traverses the array along the first dimension, then the second, then the third, and so on; for matrices, this is equivalent to column-major ordering.

  • A numeric array-like object of the same dimensions as dim, where each entry contains the parameter value for the corresponding entry of the output array. This can be another DelayedArray object.

Representing sparsity

For certain distributions, we may expect a large number of zeroes in the random output. We provide the option to treat the sampled values as being sparse, by setting sparse=TRUE in the constructors of the relevant subclasses. This is optional as most distributions will not yield sparse arrays for most of their possible parameter space.

When sparse=TRUE, the block processing machinery in DelayedArray will return a sparse array. This gives downstream applications the opportunity to use more efficient sparse algorithms when relevant. However, this option does not affect the sampling itself; the result is always the same as a dense array, just that the output is coerced into a SVT_SparseArray.

We can determine whether a RandomArraySeed x has a sparse interpretation with is_sparse(x).

Author(s)

Aaron Lun

See Also

The RandomUnifArraySeed class, which implements sampling from a uniform distribution.

The RandomPoisArraySeed class, which implements sampling from a Poisson distribution.


LTLA/DelayedRandomArray documentation built on July 7, 2024, 12:39 p.m.