RandomArraySeed-class: A DelayedArray seed supplying chunked random values

Description Chunking dimensions Implementing subclasses Distributional parameters Representing sparsity Author(s) See Also

Description

The RandomArraySeed is a DelayedArray seed that performs reproducible, on-demand sampling of randomly distributed values. Note that this is a virtual class; the intention is to define concrete subclasses corresponding to specific parameterized distributions.

Chunking dimensions

The array is conceptually partitioned into contiguous chunks of the same shape. The random values in each chunk are initialized with a different seed and stream via the PCG32 pseudo-random number generator (see the dqrng package for details). This design allows us to rapidly access any given subarray without having to do jump-aheads from the start of the stream.

The default chunking dimensions are set to the square root of the array dimensions - or 100, whichever is larger. This scheme provides decent though suboptimal performance along any dimension. If the access pattern is known beforehand, a better chunking scheme can often be chosen and passed to the chunkdim argument.

Note that changing the chunking dimensions will change the ordering of array values, even if the seeds are unchanged. This may be unexpected, given that chunking in real datasets will never change the data, only the performance of access operations. However, it is largely unavoidable in this context as the random number stream is rearranged within the array.

The chunkdim(x) method will return the chunk dimensions of a RandomArraySeed instance x. This will be used by the DelayedArray machinery to optimize block processing by extracting whole chunks where possible.

Implementing subclasses

To sample from a specific distribution, we can implement a concrete subclass of the RandomArraySeed. This is done by implementing methods for sampleDistrFun and sampleDistrParam.

In the code chunks below, x is an instance of a RandomArraySeed subclass:

The extract_array method for the RandomArraySeed will automatically use both of the above methods to sample from the specified distribution. This is achieved by randomly sampling from a standard uniform distribution, treating the values as probabilities and converting them into quantiles.

Distributional parameters

Distributional parameters are passed to the relevant quantile function to obtain a random value from the desired distribution. Each parameter can be:

Representing sparsity

For certain distributions, we may expect a large number of zeroes in the random output. We provide the option to treat the sampled values as being sparse, by setting sparse=TRUE in the constructors of the relevant subclasses. This is optional as most distributions will not yield sparse arrays for most of their possible parameter space.

When sparse=TRUE, the block processing machinery in DelayedArray will return a sparse array. This gives downstream applications the opportunity to use more efficient sparse algorithms when relevant. However, this option does not affect the sampling itself; the result is always the same as a dense array, just that the output is coerced into a SparseArraySeed.

We can determine whether a RandomArraySeed x has a sparse interpretation with is_sparse(x).

Author(s)

Aaron Lun

See Also

The RandomUnifArraySeed class, which implements sampling from a uniform distribution.

The RandomPoisArraySeed class, which implements sampling from a Poisson distribution.


LTLA/DelayedRandomArray documentation built on Dec. 18, 2021, 3:40 a.m.