knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4.5, fig.align = "center" ) options(tibble.print_min = 6, tibble.print_max = 6) modern_r <- getRversion() >= "4.1.0"
In situations where we cannot use the inversion method (situations where obtaining the quantile function is not possible) and neither do we know a transformation involving a random variable from which we can generate observations, we can make use of the acceptance-rejection method.
Suppose $X$ and $Y$ are random variables with probability density function (pdf) or probability function (pf) $f$ and $g$, respectively. Furthermore, suppose there exists a constant $c$ such that
$$\frac{f(x)}{g(y)} \leq c,$$ for every value of $x$, with $f(x) > 0$. To use the acceptance-rejection method to generate observations of the random variable $X$, using the algorithm below, first find a random variable $Y$ with pdf or pf $g$, such that it satisfies the above condition.
Important:
```{block2, type='rmdimportant'}
**Algorithm of the Acceptance-Rejection Method**: 1 - Generate an observation $y$ from a random variable $Y$ with pdf/pf $g$; 2 - Generate an observation $u$ from a random variable $U\sim \mathcal{U} (0, 1)$; 3 - If $u < \frac{f(y)}{cg(y)}$ accept $x = y$; otherwise reject $y$ as an observation of the random variable $X$ and go back to step 1. **Proof**: Consider the discrete case, that is, $X$ and $Y$ are random variables with pfs $f$ and $g$, respectively. By step 3 of the algorithm above, we have $\{accept\} = \{x = y\} = u < \frac{f(y)}{cg(y)}$. That is, $$P(accept | Y = y) = \frac{P(accept \cap \{Y = y\})}{g(y)} = \frac{P(U \leq f(y)/cg(y)) \times g(y)}{g(y)} = \frac{f(y)}{cg(y)}.$$ Hence, by the [**Law of Total Probability**](https://en.wikipedia.org/wiki/Law_of_total_probability), we have: $$P(accept) = \sum_y P(accept|Y=y)\times P(Y=y) = \sum_y \frac{f(y)}{cg(y)}\times g(y) = \frac{1}{c}.$$ Therefore, by the acceptance-rejection method, we accept the occurrence of $Y$ as an occurrence of $X$ with probability $1/c$. Moreover, by Bayes' Theorem, we have $$P(Y = y | accept) = \frac{P(accept|Y = y)\times g(y)}{P(accept)} = \frac{[f(y)/cg(y)] \times g(y)}{1/c} = f(y).$$ The result above shows that accepting $x = y$ by the algorithm's procedure is equivalent to accepting a value from $X$ that has pf $f$. For the continuous case, the proof is similar. **Important**: ```{block2, type='rmdimportant'} <div class=text-justify> Notice that to reduce the computational cost of the method, we should choose $c$ in such a way that we can maximize $P(accept)$. Therefore, choosing an excessively large value of the constant $c$ will reduce the probability of accepting an observation from $Y$ as an observation of the random variable $X$. </div>
Note:
```{block2, type='rmdnote'}
# Installation and loading the package The `AcceptReject` package is available on CRAN and can be installed using the following command: ```r install.packages("AcceptReject") # or install.packages("remotes") remotes::install_github("prdm0/AcceptReject", force = TRUE) # Load the package library(AcceptReject)
accept_reject
FunctionAmong various functions provided by the AcceptReject library, the acceptance_rejection function implements the acceptance-rejection method. The accept_reject()
function has the following signature:
accept_reject( n = 1L, continuous = TRUE, f = NULL, args_f = NULL, f_base = NULL, random_base = NULL, args_f_base = NULL, xlim = NULL, c = NULL, parallel = FALSE, cores = NULL, warning = TRUE, ... )
Many of the arguments the user will not need to change, as the AcceptReject::accept_reject()
function already has default values for them. However, it is important to note that the f argument is the probability density function (pdf) or probability function (pf) of the random variable $X$ from which observations are desired to be generated. The args_f
argument is a list of arguments that will be passed to the f function. The c
argument is the value of the constant c
that will be used in the acceptance-rejection method. If the user does not provide a value for c
, the accept_reject()
function will calculate the value of c
that maximizes the probability of accepting observations from $Y$ as observations from $X$.
Note:
``{block2, type='rmdnote'}
An important observation is that the
accept_reject()function can work in a parallelized manner on Unix-based operating systems. If you use operating systems such as Linux or MacOS, you can benefit from parallelization of the
accept_reject()function. To do this, simply set the
parallel = TRUEargument. In Windows, parallelization is not supported, and setting the
parallel = TRUE` argument will have no effect.
You do not need to define the `c` argument when using the `accept_reject()` function. By default, if `c = NULL`, the `accept_reject()` function will calculate the value of `c` that maximizes the probability of accepting observations from $Y$ as observations from $X$. However, if you want to set a value for `c`, simply pass a value to the `c` argument. **Details of the optimization of `c`**: ```{block2, type='rmdnote'} Depending on how complicated the probability density function (pdf) or probability function (pf) of the random variable $X$ from which observations are desired to be generated is, optimizing the value of `c` can be a difficult optimization process, but generally, it is not. Therefore, unless you have reasons to set a value for `c`, it is recommended to use the default value `c = NULL`. For very complicated functions, you may choose a sufficiently large `c` to ensure that the method works well.
The ...
argument allows changing the desired precision in the optimize()
function, used to optimize the value of the constant c
. By default, tol = .Machine$double.eps^0.25 is used.
Below are some examples of using the accept_reject()
function to generate pseudo-random observations of discrete and continuous random variables. It should be noted that in the case of $X$ being a discrete random variable, it is necessary to provide the argument continuous = FALSE
, whereas in the case of $X$ being continuous (the default), you must consider continuous = TRUE
.
As an example, let $X \sim Poisson(\lambda = 0.7)$. We will generate $n = 1000$ observations of $X$ using the acceptance-rejection method, using the accept_reject()
function. Note that it is necessary to provide the xlim
argument. Try to set an upper limit value for which the probability of $X$ assuming that value is zero or very close to zero. In this case, we choose xlim = c(0, 20)
, where dpois(x = 20, lambda = 0.7)
is very close to zero (r dpois(x = 20, lambda = 0.7)
).
library(AcceptReject) # Ensuring Reproducibility set.seed(0) # Generating observations data <- AcceptReject::accept_reject( n = 1000L, f = dpois, continuous = FALSE, args_f = list(lambda = 0.7), xlim = c(0, 20), parallel = FALSE ) # Viewing organized output with useful information print(data) # Calculating the true probability function for each observed value values <- unique(data) true_prob <- dpois(values, lambda = 0.7) # Calculating the observed probability for each value in the observations vector obs_prob <- table(data) / length(data) # Plotting the probabilities and observations plot(values, true_prob, type = "p", pch = 16, col = "blue", xlab = "x", ylab = "Probability", main = "Probability Function") # Adding the observed probabilities points(as.numeric(names(obs_prob)), obs_prob, pch = 16L, col = "red") legend("topright", legend = c("True probability", "Observed probability"), col = c("blue", "red"), pch = 16L, cex = 0.8) grid()
Note that it is necessary to specify the nature of the random variable from which observations are desired to be generated. In the case of discrete variables, the argument continuous = FALSE
must be passed.
Now, consider that we want to generate observations from a random variable $X \sim Binomial(n = 5, p = 0.7)$. Below, we will generate $n = 2000$ observations of $X$.
library(AcceptReject) # Ensuring reproducibility set.seed(0) # Generating observations data <- AcceptReject::accept_reject( n = 2000L, f = dbinom, continuous = FALSE, args_f = list(size = 5, prob = 0.5), xlim = c(0, 20), parallel = FALSE ) # Viewing organized output with useful information print(data) # Calculating the true probability function for each observed value values <- unique(data) true_prob <- dbinom(values, size = 5, prob = 0.5) # Calculating the observed probability for each value in the observations vector obs_prob <- table(data) / length(data) # Plotting the probabilities and observations plot(values, true_prob, type = "p", pch = 16, col = "blue", xlab = "x", ylab = "Probability", main = "Probability Function") # Adding the observed probabilities points(as.numeric(names(obs_prob)), obs_prob, pch = 16L, col = "red") legend("topright", legend = c("True probability", "Observed probability"), col = c("blue", "red"), pch = 16L, cex = 0.8) grid()
To expand beyond examples of generating pseudo-random observations of discrete random variables, consider now that we want to generate observations from a random variable $X \sim \mathcal{N}(\mu = 0, \sigma^2 = 1)$. We chose the normal distribution because we are familiar with its form, but you can choose another distribution if desired. Below, we will generate n = 2000
observations using the acceptance-rejection method. Note that continuous = TRUE
.
library(AcceptReject) # Ensuring reproducibility set.seed(0) # Generating observations data <- AcceptReject::accept_reject( n = 2000L, f = dnorm, continuous = TRUE, args_f = list(mean = 0, sd = 1), xlim = c(-4, 4), parallel = FALSE ) # Viewing organized output with useful information print(data) hist( data, main = "Generating Gaussian observations", xlab = "x", probability = TRUE, ylim = c(0, 0.4) ) x <- seq(-4, 4, length.out = 500L) y <- dnorm(x, mean = 0, sd = 1) lines(x, y, col = "red", lwd = 2) legend("topright", legend = "True density", col = "red", lwd = 2)
In the examples above, the graphs were built without using the plot.accept_reject()
function. This is just to show that you can manipulate the returning object using the accept_reject()
function, that is, the class object
accept_reject
.
However, the plot.accept_reject()
function can be used to generate graphs in a simpler way. Below, an example of how to use the plot.accept_reject()
function to generate the probability density plot of the normal distribution. However, note that the plot.accept_reject()
function makes the plotting task simpler and more direct. See the following example:
library(AcceptReject) library(cowplot) # install.packages("cowplot") # Ensuring reproducibility set.seed(0) simulation <- function(n){ AcceptReject::accept_reject( n = n, f = dnorm, continuous = TRUE, args_f = list(mean = 0, sd = 1), xlim = c(-4, 4), parallel = FALSE ) } # Inspecting a <- plot(simulation(n = 250L)) b <- plot(simulation(n = 2500L)) c <- plot(simulation(n = 25000L)) d <- plot(simulation(n = 250000L)) plot_grid(a, b, c, d, nrow = 2L, labels = c("a", "b", "c", "d"))
See another example, in the discrete case:
library(AcceptReject) library(cowplot) # install.packages("cowplot") # Ensuring Reproducibility set.seed(0) simulation <- function(n){ AcceptReject::accept_reject( n = n, f = dpois, continuous = FALSE, args_f = list(lambda = 0.7), xlim = c(0, 20), parallel = FALSE ) } a <- plot(simulation(25L)) b <- plot(simulation(250L)) c <- plot(simulation(2500L)) d <- plot(simulation(25000L)) plot_grid(a, b, c, d, nrow = 2L, labels = c("a", "b", "c", "d"))
The accept_reject()
function returns an object of class accept_reject
. When executing the print()
function on an object of this class, an organized output will be shown. However, you can operate on this instance of the accept_reject
class as any atomic vector. In the example below, notice that you can obtain a histogram with the hist()
function or check the size of the vector of observations generated using the length()
function.
library(AcceptReject) data <- accept_reject( n = 1000L, f = dnorm, continuous = TRUE, args_f = list(mean = 0, sd = 1), xlim = c(-4, 4) ) # Creating a histogram hist(data) # Checking the size of the vector of observations length(x)
If you want to access some metadata, use the attr()
function. Check the list of attributes by doing:
library(AcceptReject) data <- accept_reject( n = 100L, f = dnorm, continuous = TRUE, args_f = list(mean = 0, sd = 1), xlim = c(-4, 4) ) attributes(data) # Accessing the value c attr(data, "c")
In any case, it is important to highlight that, in general, you will not need to access
these attributes. The greatest interest will be in having access to the vector of observations generated. If you want to access the observation values directly in an atomic vector in R without attributes, without an organized printout, simply coerce the object using the as.vector()
function, as shown in the following example:
library(AcceptReject) data <- accept_reject( n = 100L, f = dnorm, continuous = TRUE, args_f = list(mean = 0, sd = 1), xlim = c(-4, 4) ) class(data) print(data) # Coercing the object into an atomic vector without attributes data <- as.vector(data) print(data)
Important:
``{block2, type='rmdnote'}
You will not need to coerce the object of the
accept_rejectclass into an atomic vector with no attributes unless you have a specific reason to do so. The object of the
accept_rejectclass is an atomic vector with attributes, and you can operate on it like any atomic vector. Everything you can do with an atomic vector, you can do with an object of the
accept_reject` class.
Using unnecessary coercion may impose a certain computational cost, depending on the size of the vector of observations generated, for example, in a simulation study. ```
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.