Suite for testing non-uniform random number generators.
|License:||GPL 2 or later|
rvgtest is a set of tools to investigate the quality of non-uniform pseudo-random random generators (RVG). Thus it provides functions to visualize and test for possible defects. There are three mean reasons for such defects and errors:
Errors in the design of algorithms – The proof for theorem that claims the correctness of the algorithm is wrong.
Implementation errors – Mistakes in computer programs.
Limitations of floating point arithmetic and round-off errors in implementations of these algorithms in real world computers.
Of course testing software is a self-evident part of software engineering. Implementation errors usually result in large deviations from the requested distribution and thus errors of type 2 are easily detected. However, this need not always be the case, for example for rather complicated algorithms like those based on patchwork methods.
The same holds for errors of type 1. In the best of all worlds, there exists a correct proof for the validity of the algorithm. In our world however human can err. Then the deviations are rather small, since otherwise it would have been detected when testing the implementation for errors of type 2.
Errors of type 3 can be a problem when the requested distribution
has extreme properties. E.g., it is no problem to generate a sample of
beta distributed random variates with shape parameters 0.001 using
rbeta(n=100, shape1=0.001, shape2=0.001).
However, due the limited resolution of floating point numbers it behaves
like a discrete distribution (especially near 1). It is not always
obvious whether such round-off errors will influence ones simulation
It is the purpose of this package to provide some tools to find possible errors in RVGs. However, observing a defect in (the implementation of) a pseudo-random variate generator by purely statistical tools may require a large sample size which that exceeds the memory when hold in a single array in R. (Nevertheless, there is some chance that this defect causes an error in a particular simulation with a moderate sample size.) Hence we have implemented routines that can run tests on very large sample sizes (which are only limited by the available runtimes).
Currently there are two toolsets for testing random variate generators for univariate distributions:
Testing based on histograms for all kinds of RVGs.
Estimating errors of RVGs that are based on numerical inversion methods.
A frequently used method for testing univariate distributions is based on the following strategy: Draw a sample, compute a histogram and run a goodness-of-fit test on the resulting frequency table.
We have implemented a three step procedure:
Create tables that can hold the information of huge random samples.
Perform some test for the null hypothesis on these tables.
Visualize these tables as well the results of the tests.
The advantages of this procedure are:
Huge total sample sizes are possible (only limited by available runtime but not by memory).
Can run multiple tests on the same random sample.
Inspect data visually.
In addition there are also some random functions for introducing defects in other random variate generators artificially. Thus one may investigate the power of tests.
the respective syntax of the call).
Perturbation of RVGs:
Random variate generators that are based on inverting the distribution function preserve the structure of the underlying uniform random number generator. Given the fact that state-of-the-art uniform random number generators are well tested, it is sufficient to estimate (maximal) approximation errors.
Let G^[-1] denote the approximate inverse distribution function (quantile function) and F the (exact) cumulative distribution. Then the following measures for the approximation erros are implemented:
e_u(u) = |u - F(G^[-1](u))|
e_x(u) = |F^[-1](u) - G^[-1](u)|
e_x(u) / |F^[-1](u)|
We are convinced that the u-error is the most convenient measure for the approximation error in the framework of Monte Carlo simulation. E.g., goodness-of-fit tests like the chi-square test or the Kolmogorov-Smirnov test look at this deviation.
As for the histogram based tests we have implemented in such a way that sample sizes are not limited by memory. Again data generation and visualization is separated into to routines.
plot.rvgt.ierror for the syntax of the
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## ------------------------------------------------ ## 1. Histogram ## ------------------------------------------------ ## Use a poor Gaussian random variate generator ## (otherwise we should not see a defect). RNGkind(normal.kind="Buggy Kinderman-Ramage") ## Create table of bin counts. ## Use a sample of 20 times 10^5 random variates. table <- rvgt.ftable(n=1e5, rep=20, rdist=rnorm, qdist=qnorm) ## Plot histogram for (cumulated) data plot(table) ## Perform a chi-square goodness-of-fit test and plot result r1 <- rvgt.chisq(table) plot(r1) ## Perform M-test and plot both results r2 <- rvgt.Mtest(table) plot.rvgt.htest(list(r1,r2)) ## ------------------------------------------------ ## 2. Numerical Inversion ## ------------------------------------------------ ## Create a table of u-errors for spline interpolation of ## the inverse CDF of the standard normal distribution. aqn <- splinefun(x=pnorm((-100:100)*0.05), y=(-100:100)*0.05, method="monoH.FC") ## Use a sample of size of 10^5 random variates. uerrn <- uerror(n=1e5, aqdist=aqn, pdist=pnorm) ## Plot u-errors plot(uerrn)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.