README.md

SCMC - Stochastic Collocation Monte Carlo

This is a development version of the scmc package for the R programming language.

Installation

The package isn’t available on CRAN so the only way to install the package is to use the devtools package and run

devtools::install_github("blaza/scmc")

Usage

The main function currently implemented is univariate_sampler which is a flexible implementation of the method (and thus with a bit more complicated interface) for generating univariate distributions. We’ll cover here a couple of basic examples which give an overall picture of the package capabilities.

Example: Logistic distribution

We’ll generate variates from the Logistic distribution.

The SCMC method implies interpolating , where is the target random variable and is a random variable which can be efficiently generated, and generating the samples using the formula , where are variates from the distribution. In this example we’ll use the standard normal variable . By default, we use the RcppZiggurat::zrnorm function to generate normal variates.

The code to generate the Logistic distribution in the scmc package is

library(scmc)
# create the sampler
sampler <- univariate_sampler(qlogis, gaussian_nodes(7))
## Loading required package: RcppZiggurat
# generate 10000 random variates
smp <- sampler(1e5)

In its basic form, the univariate_sampler function requires the inverse (i.e. the quantile function of ) as the first argument, and the nodes for the interpolation. In cases where normally distributed are used, optimal nodes for interpolation are the nodes of the Gaussian quadrature with respect to the weight function (density of ). The third argument to univariate_sampler is xdist which is by default "norm", indicating the standard normal distribution.

The quality of the generated sample can be visualized with it’s density

# plot the sample density
plot(density(smp))

# add a plot of the theoretical logistic density
curve(dlogis(x), add = TRUE, col = "green")

The curves are nearly the same, so the approximation is good.

Example: Gamma distribution

For the next example, we’ll use the gamma distribution, specifically . This is a positive distribution, so we would like to transform it using the log transform to get a real valued random variable and then upon sampling use the exp transform to get a sample from the original distribution. Basically, we model instead of and sample to get . The code example follows

library(scmc)
# create the sampler
sampler <- univariate_sampler(function(x) qgamma(x, 5, 2),
                              gaussian_nodes(7),
                              transform = log, # the transformation of
                                               # the quantile function
                              inv_transform = exp) # the inverse of transform

# generate 10000 random variates
smp <- sampler(1e5)

We can check the quality of the sample distribution by plotting the empirical cdf and the theoretical cdf of or the histogram overlayed with the density.

par(mfrow=c(1, 2))
smp_ecdf <- ecdf(smp)
curve(smp_ecdf(x), xlim = c(0, 7))
curve(pgamma(x, 5, 2), add = TRUE, col = "green")

hist(smp, breaks = 100, probability = TRUE)
curve(dgamma(x, 5, 2), add = TRUE, col = "green")

Again, the approximation is excellent.

Example: Students t distribution

Now we’ll deomnstrate using the grid stretching technique for heavy-tailed distributions. We use the students t distribution with 2 degrees of freedom. All that’s needed is to add a gss argument which specifies the value in the technique (section 4.1.1. in Grzelak et al. 2014).

library(scmc)
# create the sampler
sampler <- univariate_sampler(function(x) qt(x, df = 2),
                              gaussian_nodes(15),
                              gss = 1.657)

# generate 10000 random variates
smp <- sampler(1e5)

We’ll use the Kolmogorov-Smirnov test now to test the distribution

ks.test(smp, function(x) pt(x, df = 2))
## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  smp
## D = 0.0024044, p-value = 0.6098
## alternative hypothesis: two-sided

A large p-value indicates a good distribution.

Example: Beta distribution

This time we demonstrate sampling from a bounded distribution. A good example is the beta distribution. It comes in a wide variety of shapes, and we’ll use the distribution, with a distinct “U” shape. For this example we use the chebyshev points for interpolation, and interpolate directly . This is equivalent to using a uniformly distributed , so we’ll set the xdist="unif" argument.

library(scmc)
# create the sampler
sampler <- univariate_sampler(function(x) qbeta(x,  0.5, 0.5),
                              chebyshev_nodes(15),
                              xdist = "unif")

# generate 10000 random variates
smp <- sampler(1e5)

And we visualise the distribution with the histogram

hist(smp, breaks = 100, probability = TRUE)
curve(dbeta(x, 0.5, 0.5), add = TRUE, col = "green")

and run ks.test

ks.test(smp, function(x) pbeta(x, 0.5, 0.5))
## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  smp
## D = 0.0037263, p-value = 0.1244
## alternative hypothesis: two-sided

which confirms a good approximation.

References

Grzelak, Lech A., Jeroen Witteveen, Maria Suarez-Taboada, and Cornelis W. Oosterlee. 2014. “The Stochastic Collocation Monte Carlo Sampler: Highly Efficient Sampling from ’Expensive’ Distributions.” *SSRN Electronic Journal*. Elsevier BV. .


Blaza/scmc documentation built on May 29, 2019, 6:41 a.m.