title: "Stat 243 Final Project Report" author: "Brian Collica, Ashish Ramesh, Ben Searchinger" output: pdf_document


Location

The package, including all associated files and scripts, is located in the following public GitHub repository:

The package can be installed locally using remotes::install_github("bcollica/ars")

Architecture

The code was written as a set of modular functions, each of which carries out a specific task. Each of these functions is then called by the main wrapper function ars.

Modular Functions

The functions we added include a function for generating tangent intersections for the given density from the abscissae provided, finding the upper hull from these tangent lines, sampling from the envelope created by the upper hull, and creating the lower hull from given abscissae. We included three additional helper functions: one for approximating a derivative via the finite difference method, one to comprehensively check the inputs to the main wrapper function, and one to periodically check that log-concavity is satisfied.

Below are the descriptions for each modular function:

The individual functions are designed such that they can be run independent of the main ars() function if needed. However, several of these functions include optional arguments that can accept the results of previous functions (for example, the outputs of tanIntersect()), which are used within the main ars() function to save computation time. Each function also has its own help page.

Main ars() Function

ars() takes five arguments, the details of which can be seen by calling help(ars) or ?ars. These arguments give the user some flexibility. For example, the user can choose to supply some initial abscissae values, or have the function generate them instead. The arguments which are strictly necessary are: n, the number of values to sample; f, the distribution function; and (if applicable) f_params, a list of parameters to the distribution function. Additional optional arguments include a vector of starting abscissae and a 2-element vector for the support of the function (the default of this is $(-\infty,\infty)$).

The main function starts by checking the user arguments with checkThat(), generating starting values as needed. It then proceeds through the adaptive rejection sampling algorithm as described in Gilks et. al. In each computational step of the process, it calls the individual modular functions written for that task. The function returns a list containing the final sampled values, as well as counts of the number of abscissae added, the number of iterations, and the number of rejected samples.

Testing

We included tests for each individual modular function, as well as more comprehensive tests for the ars() function.

Testing for the helper functions checkThat() and cavitySearch() are mainly focused on catching known errors. The tests for checkThat() include checks to ensure that the function catches missing and/or incorrect arguments, and the test for cavitySearch() uses the $t$-distribution, which is known to not be log-concave, to ensure that it is caught as such. The tests for approxD() ensure that it correctly computes the derivative of a set of functions by comparing to known values, and checks for errors when incorrect parameters are used. The tests for functions related to computing the hulls (tanIntersect(), upperHull(), lowerHull()) check for correct dimensions of the outputs, as well as known mathematical properties. For example, there are tests for whether the upper and lower hulls bound the log-distribution correctly for two cases of inputs (normal and gamma distributions), as well as symmetry of the hulls and tangent intersections where appropriate.

Tests for ars() fall into 3 main categories. The first of these is checking reproducibility of results. A set of results for several distributions was generated using the same seed, and these tests ensure that, when run again with this seed, the ars() function produces the same outputs. The second set of tests involved testing for known errors. These tests ensure that non log-concave distributions and mismatched arguments are caught correctly. The final set of tests ensure that the function runs "silently" on inputs which are known to work.

Tradeoffs

One major tradeoff in our approach is due in part to the sequential nature of the algorithm. This meant that, while individual modular functions are vectorized, the best approach in following the algorithm itself ended up being to sample one value at a time. This is due to the computational cost of sampling values, meaning it's computationally expensive to over-sample and throw out the samples past the point at which it becomes necessary to recalculate tanIntersect(). Future versions of this package could revisit this tradeoff.

In testing the performance of our approach, we compared it to the existing ars package on CRAN. We noted that our results ran significantly slower. However, we believe this to be due to two main differences: the CRAN package uses C++ for the majority of its calculations, while our package uses R exclusively, and the CRAN package requires the user to pass in values for the derivative of the log-density, while our package includes the computation of these values and only requires the user to specify the distribution function and appropriate parameters.

Results

Our results for the number of iterations to achieve one sample from the standard normal distribution for different starting abscissae are shown below. The generating code can be found in simulations.R.

read.csv('./table.csv', sep = ',')

Our results are generally consistent with Gilks et. al. with the caveat that we're counting iterations rather than evaluations of $h(x)$, so our results skew slightly lower. We also use 100 simulations instead of 1000 due to memory and time constraints, so there's more variability in our result. Our conclusion is the same, that the starting $(x_1, x_2) = (-1, 1)$ is ideal for the standard normal distribution.

Individual Contributions

Brian wrote the approxD(), tanInteresect(), checkThat(), and cavitySearch() functions, as well as their tests. He also worked on parts of the main ars() wrapper function. and its tests. Ashish wrote the upperHull() and sampleEnv() functions, as well as tests for these and the lowerHull() function. Ben wrote the lowerHull() function and the main wrapper ars() function, and the simulation that generated the data for the table. All group members were involved in catching errors in the various functions, planning the layout of the package, and writing the final report.



bsearchinger/ARS documentation built on Dec. 19, 2021, 12:44 p.m.