knitr::opts_chunk$set(echo = TRUE, message = FALSE)
source("utils.R") source("ars.R") source("checks.R")
The ars package is located in Yanting Pan's Github repository: https://github.com/yantingpan/ars
The main function ars() takes two required inputs, namely (1) sample size and (2) a target log-concave density function, as well as four other optional inputs, namely (3) the lower bound, (4) the upper bound, (5) the center of the distribution and (6) a step value. The lower bound and upper bound define the domain of the target function to be evaluated with the default of negative infinity and positive infinity, respectively. The center is the mode of the target function with default of zero, and the step is the bandwidth around the center used to find the starting abscissae with default of 0.5. The ars() function draws samples from the target function using Adaptive Rejection Sampling along with appropriate validity tests, and returns the specified number of samples.
Below we use a gamma distribution to demonstrate how ars() works. Suppose a user wants to sample 10,000 observations from the gamma distribution. The user needs to input the sample size N = 10000, and the function of the specified gamma distribution, as well as optional inputs for a lower bound and upper bound for the sampling.
set.seed(0) ## custom function for density of gamma distribution gamma_test <- function(x){ k <- dgamma(x, shape = 7.5, scale = 1.0) return(k) } ## run ars() function vals <- ars(10000, gamma_test, l = 0.01, u = 20) ## graph output z <- seq(0.01, 20, by = 0.05) plot(density(vals), xlab = "Observation", ylab = "Density", main = "Gamma Distribution", col = "red", lwd = 2) lines(z, gamma_test(z), type = 'l', col = "blue", lwd = 2) legend("topright", legend = c("Sample Density", "Gamma PDF"), col = c("red", "blue"), lty = 1)
The algorithm is vectorized to the extent possible in order to maximize efficiency.
system.time(ars(1000, dnorm))
The implementation contains four parts: initialization step, sampling step, updating step, and validity checks. In order to create modular code, we have various auxiliary functions used to implement discrete tasks, which serves the main ars() function.
If the boundary is specified, we will use it as the starting abscissae. If not, we will find the abscissae starting from the center of function. We assume the center is 0, and check the derivative of candidate abscissae generated by the step, 0.5, from the center.
After 50 iterations, an error message is generated to ask the user to input the estimated center of the function in order to assist the function in finding reasonable starting points. Before assigning the value of starting abscissae, we will check whether the accepted point(s) are defined on the function.
Six sub-functions are involved: setParams(), calcDeriv(), lowerPDF(), expPDF(), expCDF(), and invCDF().
The structure of the functions is:
ars(): main function; samples from the CDF of the enveloping function, performs the squeeze test and the envelope test, and either (i) accepts or (ii) rejects and updates the fixed points.
setParams(): determines the parameters for the enveloping function and specific set of fixed points.
lowerPDF(): calculates the value of the squeezing function for a specified evaluation point.
expPDF(): calculates the value of the piecewise exponential enveloping function for a specified evaluation point. For a given x-value, the function finds the related piece of the enveloping log function and calculates the y-value using $y = mx + b$ form. It then exponentiates the result, which is returned.
invCDF(): inverts the piecewise exponential CDF of the enveloping function; takes as input a number sampled from the Unif(0,1) distribution, finds the corresponding piece of the enveloping function, and inverts the equation used in the expCDF() function in order to return a sample from the CDF.
Since the ars() function does not provide any intermediate output, we've used some of the sub-functions below to demonstrate the code. The code below uses the gamma function above to graphically demonstrate the relationship between the target function and the enveloping function.
## define function inputs min <- 0.01 max <- 20 p <- c(2, 4, 10) # define the initial fixed points xvec <- seq(min, max, by = 0.1) target <- gamma_test(xvec) ## calculate corresponding envelope function par <- setParams(gamma_test, 0.01, 20, p) # parameters for function envelope <- expPDF(xvec, p, par$int_x, par$m_p, par$lf_p) # enveloping PDF ## plot log PDFs plot(xvec, log(target), type = 'l', xlab = "X-Values", ylab = "Log Density", main = "Log PDF Comparison") lines(xvec, log(envelope), col = "red") legend("bottomright", legend = c("Envelope", "Target"), col = c("red", "black"), lty = 1, cex = 0.75) ## plot PDFs plot(xvec, target, type = 'l', ylim = c(0,0.3), xlab = "X-Values", ylab = "Density", main = "PDF Comparison") lines(xvec, envelope, col = "red") legend("topright", legend = c("Envelope", "Target"), col = c("red", "black"), lty = 1, cex = 0.75)
The chart below shows the CDF of the enveloping function shown above.
yvec <- seq(0, 1, by=0.001) envelope_CDF <- invCDF(yvec, par$int_x, par$m_p, par$b_vec, par$nc, par$shift, par$adj) plot(envelope_CDF, yvec, type = 'l', xlab = "X-Values", ylab = "Cumulative Density", main = "Enveloping CDF")
Check sample size: the input sample size should be a positive numeric integer. Stop the function otherwise.
Check boundary: the boundary input should be numeric, and the lower bound must less than the upper bound. In case the user mistakenly enters the value of lower bound as the upper bound and the value of upper bound as the lower bound, provide the warning message and swap the values automatically for further implementation. Stop the function otherwise.
Check density function: the input density function should be a function. Stop the function otherwise.
Final Tests:
Module Tests: we tested all auxiliary functions to make sure the output of each individual function was correctly computed and formatted.
Below are plot of several distributions we used, comparing the sample results in the histogram to the target density function. The histogram shows the density of observations generated by ars(), and the blue line shows the theoretical density.
library(rmutil) library(stats) par(mfrow = c(2,3)) ## test 1: standard normal with -inf, inf set.seed(0) set1 <- ars(10000, dnorm, l = -Inf, u = Inf) x1 <- seq(-4, 4, length = 100) hist(set1, prob = TRUE, xlab = "observation", ylab = "", main = "Standard Normal Distribution", ylim = c(0, 0.4)) lines(x1, dnorm(x1), type = 'l', col = "blue", lwd = 2) ## test 2 : standard normal with (-1, 2) set.seed(0) set2 <- ars(10000, dnorm, l = -1, u = 2) x2 <- seq(-1, 2, by = 0.05) hist(set2, prob = TRUE, xlab = "observation", ylab = "", main = "Truncated Normal Distribution") lines(x2, dnorm(x2)/(pnorm(2)-pnorm(-1)), type = 'l', col = "blue", lwd = 2) ## test 3 : Gamma Distribution set.seed(0) gamma_test <- function(x){ k <- dgamma(x, shape=7.5, scale=1) return(k) } set3 <- ars(10000, gamma_test, 0.1, 20) x3 <- seq(0.1, 20, by = 0.05) hist(set3, prob = TRUE, xlab = "observation", ylab = "", main = "Gamma Distribution", xlim = c(0,20)) lines(x3, gamma_test(x3), type = 'l', col = "blue", lwd = 2) legend("topright", c("sample", "theory"), lty = c(1, 1), col = c("black","blue"), bty = "n", lwd = c(2, 2)) ## test 4 : Beta Distrbution set.seed(2) beta_test <- function(x){ c <- dbeta(x, shape1 = 3, shape2 = 5) return(c) } set4 <- ars(10000, beta_test, 0.01, 0.99) x4 <- seq(0.001, 0.999, by=0.05) hist(set4, prob = TRUE, xlab = "observation", ylab = "", main = "Beta Distribution", xlim = c(0,1)) lines(x4, beta_test(x4), type = 'l', col = "blue", lwd = 2) ## test 5 : Logistic distribution set.seed(0) logistic_test <- function(x){ l <- dlogis(x) return(l) } set5 <- ars(10000, logistic_test) x5 <- seq(-10, 10, by = 0.05) hist(set5, prob = TRUE, xlab = "observation", ylab = "", main = "Logistic Distribution", ylim = c(0, 0.25), xlim = c(-10, 10)) lines(x5, logistic_test(x5), type = 'l', col = "blue", lwd = 2) ## test 6 : Uniform distribution set6 <- ars(10000, dunif, 0.01, 0.99) x6 <- seq(0.01, 0.99, by = 0.05) hist(set6,prob = TRUE, xlab = "observation", ylab = "", main = "Uniform Distribution") lines(x6, dunif(x6), type = 'l', col = "blue", lwd = 2)
The Student t Distribution, Chi-square Distribution with one degree of freedom, and F Distribution are non-log-concave densities. The function works if it produces the error message, "Please provide the log-concave density."
## Student t Distribution t_test <- function(x){return(dt(x, 1))} ars(100, t_test, l = -10, u = 10) ## Chi-square Distribution with df = 1 chi_test <- function(x){return(dchisq(x, 1))} ars(1000, chi_test, l = 1) ## F Distribution f_test <- function(x){return(df(x, 9, 11))} ars(100, f_test, l = 1)
Vincent is responsible for the algorithm of sampling and updating steps. Yanting and Zhenni are together responsible for the initialization step, validity checks, tests, and R packaging. All group members reviewed, discussed and revised the codes, and all members contributed to the final report.
Gilks, W. R., and P. Wild. "Adaptive Rejection Sampling for Gibbs Sampling". Journal of the Royal Statistical Society. Series C (Applied Statistics) 41.2 (1992): 337-348.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.