In yantingpan/ars: Adaptive Rejection Sampling

knitr::opts_chunk$set(echo = TRUE, message = FALSE)

source("utils.R")
source("ars.R")
source("checks.R")

Section 0: Package Location

The ars package is located in Yanting Pan's Github repository: https://github.com/yantingpan/ars

Section 1: Function Overview

The main function ars() takes two required inputs, namely (1) sample size and (2) a target log-concave density function, as well as four other optional inputs, namely (3) the lower bound, (4) the upper bound, (5) the center of the distribution and (6) a step value. The lower bound and upper bound define the domain of the target function to be evaluated with the default of negative infinity and positive infinity, respectively. The center is the mode of the target function with default of zero, and the step is the bandwidth around the center used to find the starting abscissae with default of 0.5. The ars() function draws samples from the target function using Adaptive Rejection Sampling along with appropriate validity tests, and returns the specified number of samples.

1.1 Main Function

Below we use a gamma distribution to demonstrate how ars() works. Suppose a user wants to sample 10,000 observations from the gamma distribution. The user needs to input the sample size N = 10000, and the function of the specified gamma distribution, as well as optional inputs for a lower bound and upper bound for the sampling.

set.seed(0)

## custom function for density of gamma distribution
gamma_test <- function(x){
  k <- dgamma(x, shape = 7.5, scale = 1.0)
  return(k)
}

## run ars() function
vals <- ars(10000, gamma_test, l = 0.01, u = 20)

## graph output
z <- seq(0.01, 20, by = 0.05)
plot(density(vals), xlab = "Observation", ylab = "Density", 
     main = "Gamma Distribution", col = "red", lwd = 2)
lines(z, gamma_test(z), type = 'l', col = "blue", lwd = 2)
legend("topright", legend = c("Sample Density", "Gamma PDF"), 
       col = c("red", "blue"), lty = 1)

1.2 Efficiency

The algorithm is vectorized to the extent possible in order to maximize efficiency.

system.time(ars(1000, dnorm))

Section 2: Algorithm/Approach

The implementation contains four parts: initialization step, sampling step, updating step, and validity checks. In order to create modular code, we have various auxiliary functions used to implement discrete tasks, which serves the main ars() function.

2.1 Initialization

If the boundary is specified, we will use it as the starting abscissae. If not, we will find the abscissae starting from the center of function. We assume the center is 0, and check the derivative of candidate abscissae generated by the step, 0.5, from the center.

After 50 iterations, an error message is generated to ask the user to input the estimated center of the function in order to assist the function in finding reasonable starting points. Before assigning the value of starting abscissae, we will check whether the accepted point(s) are defined on the function.

2.2 Sampling and Updating

Six sub-functions are involved: setParams(), calcDeriv(), lowerPDF(), expPDF(), expCDF(), and invCDF().

The structure of the functions is:

ars(): main function; samples from the CDF of the enveloping function, performs the squeeze test and the envelope test, and either (i) accepts or (ii) rejects and updates the fixed points.
- setParams(): determines the parameters for the enveloping function and specific set of fixed points.
  - calcDeriv(): calculates the derivative of a function at a given point.
  - expCDF(): calculates the piecewise exponential CDF of the enveloping function and returns the parameters. Each line segment in the enveloping log PDF, each in the form of $y = mx + b$, is exponentiated and integrated using the form $(1/m) e^{mx+b}$, and then shifted vertically to create a continuous CDF. The CDF is then normalized, and the parameters used for the shifts and the normalization are returned to be used by other functions.
- lowerPDF(): calculates the value of the squeezing function for a specified evaluation point.
- expPDF(): calculates the value of the piecewise exponential enveloping function for a specified evaluation point. For a given x-value, the function finds the related piece of the enveloping log function and calculates the y-value using $y = mx + b$ form. It then exponentiates the result, which is returned.
- invCDF(): inverts the piecewise exponential CDF of the enveloping function; takes as input a number sampled from the Unif(0,1) distribution, finds the corresponding piece of the enveloping function, and inverts the equation used in the expCDF() function in order to return a sample from the CDF.

Since the ars() function does not provide any intermediate output, we've used some of the sub-functions below to demonstrate the code. The code below uses the gamma function above to graphically demonstrate the relationship between the target function and the enveloping function.

## define function inputs
min <- 0.01
max <- 20
p <- c(2, 4, 10) # define the initial fixed points
xvec <- seq(min, max, by = 0.1)
target <- gamma_test(xvec)

## calculate corresponding envelope function
par <- setParams(gamma_test, 0.01, 20, p) # parameters for function
envelope <- expPDF(xvec, p, par$int_x, par$m_p, par$lf_p) # enveloping PDF

## plot log PDFs
plot(xvec, log(target), type = 'l', xlab = "X-Values", ylab = "Log Density", 
     main = "Log PDF Comparison")
lines(xvec, log(envelope), col = "red")
legend("bottomright", legend = c("Envelope", "Target"), 
       col = c("red", "black"), lty = 1, cex = 0.75)

## plot PDFs
plot(xvec, target, type = 'l', ylim = c(0,0.3), xlab = "X-Values", 
     ylab = "Density", main = "PDF Comparison")
lines(xvec, envelope, col = "red")
legend("topright", legend = c("Envelope", "Target"), 
       col = c("red", "black"), lty = 1, cex = 0.75)

The chart below shows the CDF of the enveloping function shown above.

yvec <- seq(0, 1, by=0.001)
envelope_CDF <- invCDF(yvec, par$int_x, par$m_p, par$b_vec, par$nc, par$shift, par$adj)

plot(envelope_CDF, yvec, type = 'l', xlab = "X-Values", ylab = "Cumulative Density",
     main = "Enveloping CDF")

2.3 Validity checks

Check sample size: the input sample size should be a positive numeric integer. Stop the function otherwise.
Check boundary: the boundary input should be numeric, and the lower bound must less than the upper bound. In case the user mistakenly enters the value of lower bound as the upper bound and the value of upper bound as the lower bound, provide the warning message and swap the values automatically for further implementation. Stop the function otherwise.
Check density function: the input density function should be a function. Stop the function otherwise.
- Check existence: if the function is not defined (i.e. not finite) on the boundary points, and updating fixed points, stop the function and give an error message.
- Check differentiability: if the function is not differentiable (i.e. the calculated derivative is 'NaN' or not finite) on the boundary points, starting points, and updating fixed points, stop the function and give the error message.
- Check log-concave: the log-concave function should obey properties of log-concave functions; that is, the derivatives of the log-scale function decrease monotonically with increasing abscissae within the domain. This test is conducted on sorting the staring points and updating the fixed points as the calculations proceed. The function stops if the derivatives of the log-scale function do not follow the property stated above, and gives an error message.

Section 3: Testing

3.1 Testthat

Final Tests:
- To test our algorithm, we compared our resulting samples with the known distributions using Kolmogorov-Smirnov Tests. We tested our function against various log-concave distributions including the Normal distribution, Gamma distribution, Beta distribution, Logistic distribution, and Uniform distribution. Our algorithm consistently passed for all those functions with the significance level of 0.05. Furthermore, we also tested our function against the non-log-concave distributions including the Student t distribution, Chi-square distribution with one degree of freedom, and F distribution. Error message we designed appeared correctly as expected.
- To test the validity checks, we tested the function with wrong inputs, non-log-concave density functions, non differentiable functions and non-continuous functions. All tests produced the error message as we expected.
Module Tests: we tested all auxiliary functions to make sure the output of each individual function was correctly computed and formatted.

3.2 Results/Examples

3.2.1 Log-concave Distributions

Below are plot of several distributions we used, comparing the sample results in the histogram to the target density function. The histogram shows the density of observations generated by ars(), and the blue line shows the theoretical density.

library(rmutil)
library(stats)

par(mfrow = c(2,3))

## test 1: standard normal with -inf, inf
set.seed(0)
set1 <- ars(10000, dnorm, l = -Inf, u = Inf)
x1 <- seq(-4, 4, length = 100)
hist(set1, prob = TRUE, xlab = "observation", ylab = "", 
     main = "Standard Normal Distribution", ylim = c(0, 0.4))
lines(x1, dnorm(x1), type = 'l', col = "blue", lwd = 2)


## test 2 : standard normal with (-1, 2)
set.seed(0)
set2 <- ars(10000, dnorm, l = -1, u = 2)
x2 <- seq(-1, 2, by = 0.05)
hist(set2, prob = TRUE, xlab = "observation", ylab = "", 
     main = "Truncated Normal Distribution")
lines(x2, dnorm(x2)/(pnorm(2)-pnorm(-1)), type = 'l', col = "blue", lwd = 2)


## test 3 : Gamma Distribution
set.seed(0)
gamma_test <- function(x){
  k <- dgamma(x, shape=7.5, scale=1)
  return(k)
}
set3 <- ars(10000, gamma_test, 0.1, 20)
x3 <- seq(0.1, 20, by = 0.05)
hist(set3, prob = TRUE, xlab = "observation", ylab = "", 
     main = "Gamma Distribution", xlim = c(0,20))
lines(x3, gamma_test(x3), type = 'l', col = "blue", lwd = 2)
legend("topright", c("sample", "theory"), lty = c(1, 1), 
       col = c("black","blue"), bty = "n", lwd = c(2, 2))



## test 4 : Beta Distrbution
set.seed(2)
beta_test <- function(x){
  c <- dbeta(x, shape1 = 3, shape2 = 5)
  return(c)
}
set4 <- ars(10000, beta_test, 0.01, 0.99)
x4 <- seq(0.001, 0.999, by=0.05)
hist(set4, prob = TRUE, xlab = "observation", ylab = "", 
     main = "Beta Distribution", xlim = c(0,1))
lines(x4, beta_test(x4), type = 'l', col = "blue", lwd = 2)


## test 5 : Logistic distribution
set.seed(0)
logistic_test <- function(x){
 l <- dlogis(x)
 return(l)
}
set5 <- ars(10000, logistic_test)
x5 <- seq(-10, 10, by = 0.05)
hist(set5, prob = TRUE, xlab = "observation", ylab = "", 
     main = "Logistic Distribution", ylim = c(0, 0.25), xlim = c(-10, 10))
lines(x5, logistic_test(x5), type = 'l', col = "blue", lwd = 2)


## test 6 : Uniform distribution
set6 <- ars(10000, dunif, 0.01, 0.99) 
x6 <- seq(0.01, 0.99, by = 0.05)
hist(set6,prob = TRUE, xlab = "observation", ylab = "", main = "Uniform Distribution")
lines(x6, dunif(x6), type = 'l', col = "blue", lwd = 2)

3.2.2 Non Log-concave Distributions

The Student t Distribution, Chi-square Distribution with one degree of freedom, and F Distribution are non-log-concave densities. The function works if it produces the error message, "Please provide the log-concave density."

## Student t Distribution
t_test <- function(x){return(dt(x, 1))}
ars(100, t_test, l = -10, u = 10)

## Chi-square Distribution with df = 1
chi_test <- function(x){return(dchisq(x, 1))}
ars(1000, chi_test, l = 1)

## F Distribution
f_test <- function(x){return(df(x, 9, 11))}
ars(100, f_test, l = 1)

Section 4: Contributions

Vincent is responsible for the algorithm of sampling and updating steps. Yanting and Zhenni are together responsible for the initialization step, validity checks, tests, and R packaging. All group members reviewed, discussed and revised the codes, and all members contributed to the final report.

Section 5 : References

Gilks, W. R., and P. Wild. "Adaptive Rejection Sampling for Gibbs Sampling". Journal of the Royal Statistical Society. Series C (Applied Statistics) 41.2 (1992): 337-348.

yantingpan/ars documentation built on May 21, 2019, 10:14 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

yantingpan/ars
Adaptive Rejection Sampling

In yantingpan/ars: Adaptive Rejection Sampling

Section 0: Package Location

Section 1: Function Overview

1.1 Main Function

1.2 Efficiency

Section 2: Algorithm/Approach

2.1 Initialization

2.2 Sampling and Updating

2.3 Validity checks

Section 3: Testing

3.1 Testthat

3.2 Results/Examples

3.2.1 Log-concave Distributions

3.2.2 Non Log-concave Distributions

Section 4: Contributions

Section 5 : References

R Package Documentation

Browse R Packages

We want your feedback!

yantingpan/ars Adaptive Rejection Sampling

In yantingpan/ars: Adaptive Rejection Sampling

Section 0: Package Location

Section 1: Function Overview

1.1 Main Function

1.2 Efficiency

Section 2: Algorithm/Approach

2.1 Initialization

2.2 Sampling and Updating

2.3 Validity checks

Section 3: Testing

3.1 Testthat

3.2 Results/Examples

3.2.1 Log-concave Distributions

3.2.2 Non Log-concave Distributions

Section 4: Contributions

Section 5 : References

R Package Documentation

Browse R Packages

We want your feedback!

yantingpan/ars
Adaptive Rejection Sampling