ftable: Create RVG Frequency Table for Random Variate Generator

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Description

Function for creating frequency tables for random variate generators. Thus a histogram is computed and the bin counts are stored in an array which can be used to visualize possible defects of the pseudo-random variate generator and run goodness-of-fit tests.

The function only works for generators for univariate distributions.

Usage

1
2
rvgt.ftable(n, rep=1, rdist, qdist, pdist, ...,
            breaks = 101, trunc=NULL, exactu=FALSE, plot=FALSE)

Arguments

n

sample size for one repetition (>=100).

rep

number of repetitions.

rdist

random variate generator for a univariate distribution.

qdist

quantile function for the distribution.

pdist

cumulative distribution function for distribution.

...

parameters to be passed to rdist, qdist and pdist.

breaks

one of:

  • a single number giving the number of cells of histogram; or

  • a vector giving the breakpoints between histogram cells (in uniform scale). Notice that in the latter case the break points are automatically sorted and the first and last entry is set to 0 and 1, resp. Moreover, they must be different from each other.

trunc

boundaries of truncated domain. (optional)

exactu

logical. If TRUE then the exact locations of the given break points are used. Otherwise, these points are slightly shifted in order to accelerate exection time, see details below.

plot

logical. If TRUE, a histogram is plotted.

Details

rvgt.ftable returns tables of bin counts similar to the hist function. Bins can be specified either by the number of break points between the cells of the histogram, or by a list of break points in the u-scale. In the former case the break points are constructed such that all bins of the histogram have equal probability for the distribution under the null hypothesis, i.e., the break points are equidistributed in the u-scale using the formula u_i=i/(breaks-1) where i=0,…,breaks-1.

When the quantile function qdist is given, then these points are transformed into breaking points in the x-scale using qdist(u_i). Thus the histogram can be computed directly for random points X that are generated by means of rdist.

Otherwise the cumulative distribution function pdist must be given. If exactu is TRUE, then all non-uniform random points X are first transformed into uniformly distributed random numbers U=pdist(X) for which the histogram is created. This is slower than directly using X but it is numerically more robust as round-off error in qdist have much more influence than those in pdist.

If trunc is given, then functions qdist and pdist are rescaled to this given domain. It is recommended to provide pdist even when qdist is given.

If exactu is FALSE and the quantile function qdist is missing, then the first sample of size n is used to estimate the quantiles for the given break points using function quantile. The break points in u-scale are then recomputed using these quantiles by means of the given probability function pdist. This is usually (much) faster than calling pdist on each generated point. However, the break points are slightly perturbated (but this does not effect the correctness of the frequency table).

The argument rep allows to create multiple such arrays of bin counts and store these in one table. Thus has two advantages:

For discrete distributions function pdist must be given and both arguments qdist and exactu are ignored. Moreover, the given break points have to be adjusted according to the probability function of the discrete distribution. In particular this means that bins have to be collapsed when the probability of some number is larger than difference of break points in u-scale. Thus there resulting tables may contain less break points than requested.

The type of distribution (continuous or discrete) is autodetected by the function.

Value

An object of class "rvgt.ftable" which is a list with components:

n

sample size.

rep

number of repetitions.

ubreaks

an array of break points in u-scale.

xbreaks

an array of break points in x-scale.

count

a matrix of rep rows and (breaks-1) columns that contains the bin counts. The results for each repetition are stored row wise.

dtype

a string that contains the type of the distribution: "cont" or "discr".

Note

It is important that all given functions – rdist, qdist, and pdist – accept the same arguments passed to rvgt.ftable via ....

The random variate generator rdist can alternatively be a generator object form the Runuran package.

Author(s)

Sougata Chaudhuri sgtchaudhuri@gmail.com, Josef Leydold josef.leydold@wu.ac.at

References

W. H\"ormann, J. Leydold, and G. Derflinger (2004): Automatic Nonuniform Random Variate Generation. Springer-Verlag, Berlin Heidelberg

See Also

See plot.rvgt.ftable for the syntax of the plotting method.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
## Create a frequency table for normal distribution with mean 1 and
## standard deviation 2. Number of bins should be 50.
## Use a sample of size of 5 times 10^5 random variates.
ft <- rvgt.ftable(n=1e5,rep=5, rdist=rnorm,qdist=qnorm, breaks=51, mean=1,sd=2)

## Show histogram
plot(ft)

## Run a chi-square test
rvgt.chisq(ft)

## The following allows to plot a histgram in a single call.
rvgt.ftable(n=1e5,rep=5, rdist=rnorm,qdist=qnorm, plot=TRUE)

## Use the cumulative distribution function when the quantile function
## is not available or if its round-off errors have serious impact.
ft <- rvgt.ftable(n=1e5,rep=5, rdist=rnorm,pdist=pnorm )
plot(ft)

## Create a frequency table for the normal distribution with
## non-equidistributed break points
ft <- rvgt.ftable(n=1e5,rep=5, rdist=rnorm,qdist=qnorm, breaks=1/(1:100))
plot(ft)

## A (naive) generator for a truncated normal distribution
rdist <- function(n) {
  x <- numeric(n)
  for (i in 1:n){ while(TRUE){ x[i] <- rnorm(1); if (x[i]>1) break} }
  return(x)
}
ft <- rvgt.ftable(n=1e3,rep=5, rdist=rdist,
                  pdist=pnorm, qdist=qnorm, trunc=c(1,Inf))
plot(ft)

## An example for a discrete distribution
ft <- rvgt.ftable(n=1e5,rep=1, rdist=rgeom,pdist=pgeom, prob=0.123)
plot(ft)

rvgtest documentation built on May 1, 2019, 6:35 p.m.