knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(dplyr)
library(ggplot2)
library(MASS)
library(plotly)
library(samplr)

Background

The samplr package provides a number of functions for sampling from continuous distributions.

Rejection Sampling

Rejection sampling is a type of exact simulation method in numerical analysis and works for any distribution in $\rm I!R^m$ with a density. The idea of rejection sampling is to uniformly randomly sample the support and keep samples in the region under the graph of the density function.

Specifically, the steps of rejection sampling are:

  1. select uniformly at random a quantile candidate in the support of the function
  2. compute the probability density of the quantile candidate
  3. select a critical value via uniform sampling between 0 and the global maximum density
  4. retain the quantile candidate as a sample if the quantile density is less than the critical value; otherwise reject the candidate and start over.

Rejection sampling has two main assumptions:

  1. $f(x) \le C$ (global maximum density)
  2. $\exists$ a, b such that $P(a \le X \le b) = 1$ (finite)

Markov Chain Monte Carlo

Another class of methods to sample from a probability distribution is called Markov chain Monte Carlo. A Markov chain constructed such that its equilibrium distribution matches the desired probability distribution allows for an approximation to random sampling from that distribution by observing the chain. The longer the chain is observed, the closer the approximation will be true random sampling.

Function projectq2a

The projectq2a function generates random deviates from a continuous random variable. The user supplies the probability density function of the continuous random variable, and the function utilizing rejection sampling to generate the deviates.

Uniform

Here is an example of using projectq2a to sample from the standard uniform distribution.

The uniform distribution has probability density function:

$$\text{pdf = }\begin{cases}\frac{1}{b - a} \text{for } x \in [a, b]\0 \text{ otherwise}\end{cases}$$ We first generate a bunch of deviates and then plot their density in teal Then the true density is overlayed as a yellow line.

df_uniform_samples <- 
  data.frame(u = projectq2a(n = 10000, 
                       pdf = dunif, 
                       a = 0, 
                       b = 1, 
                       C = 1, 
                       min = 0, 
                       max = 1))

ggplot(df_uniform_samples, aes(x = u)) + 
  geom_density(color = NA, fill = "#35a09c") + 
  stat_function(aes(x = 0), fun = dunif, args = list(min = 0, max = 1), color = "#fff735", size = 1) + 
  labs(x = "X", 
       y = "Density") + 
  theme_minimal()

Beta

Here is an example of using projectq2a to sample from a beta distribution with both shape parameters equal to 2.

The beta distribution has probability density function:

$$\text{pdf = }\begin{cases}\frac{x^{\alpha - 1}(1 - x)^{\beta - 1}}{B(\alpha, \beta)} \text{ for }0 \le x \le 1\0 \text{ otherwise}\end{cases}$$

$$\text{where }B(\alpha, \beta) = \frac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha + \beta)}\text{ with }\alpha, \beta \gt 0$$

We first generate a bunch of deviates and then plot their density in teal. Then the true density is overlayed as a yellow line.

df_beta_samples <- 
  data.frame(b = projectq2a(n = 10000, 
                       pdf = dbeta, 
                       a = 0, 
                       b = 1, 
                       C = 1.5, 
                       shape1 = 2, 
                       shape2 = 2))

ggplot(df_beta_samples, aes(x = b)) + 
  geom_density(color = NA, fill = "#35a09c") + 
  stat_function(aes(x = 0), fun = dbeta, args = list(shape1 = 2, shape2 = 2), color = "#fff735", size = 1) + 
  labs(x = "X", 
       y = "Density") + 
  theme_minimal()

Custom

Here is an example of using projectq2a to sample from a custom distribution.

In this case, the custom density function is that of a beta with shape parameters 2 and 5.

$$\text{pdf = }\begin{cases}30x(1 - x)^4 \text{ for }0 \le x \le 1\0 \text{ otherwise}\end{cases}$$

dcustom <- function (x) {
  sapply(X = x, 
         FUN = function(x) {
           ifelse(0 < x & x < 1, 
                  (gamma(2+5)/(gamma(2)*gamma(5)))*(x^(2-1)*(1-x)^(5-1)), 
                  0)
         })
}

We first generate a bunch of deviates and then plot their density in teal. Then the true density is overlayed as a yellow line.

df_custom_samples <- 
  data.frame(b = projectq2a(n = 10000, 
                       pdf = dcustom, 
                       a = 0, 
                       b = 1, 
                       C = 2.5))

ggplot(df_custom_samples, aes(x = b)) + 
  geom_density(color = NA, fill = "#35a09c") + 
  stat_function(aes(x = 0), fun = dcustom, color = "#fff735", size = 1) + 
  labs(x = "X", 
       y = "Density") + 
  theme_minimal()

Function projectq3a

The projectq3a function generates random deviates from a continuous 2D distribution defined on a square. The user supplies the probability density function, and the function utilizes rejection sampling to generate the deviate pairs.

2D Uniform on unit square

We first demo sampling from a uniform distribution defined on the unit square. In order to do so, we use jdunif(), an example joint probability density function from samplr which takes x and y as arguments for the two variables.

$$\text{pdf = }\begin{cases}\frac{1}{(b - a)^2} \text{for } x, y \in [a, b]\0 \text{ otherwise}\end{cases}$$

Then we could examine what the distribution looks like in 3D by computing the probability densities for each x and y pair in a systematic sampling across the support.

l <- list(x = seq(0, 1, 0.5), 
          y = seq(0, 1, 0.5))

z <- matrix(rep(NA, length(l$x) * length(l$y)), 
            nrow = length(l$x))
for(r in 1:nrow(z)) {
  for(c in 1:ncol(z)) {
    z[r, c] <- jdunif(x = l$x[r], y = l$y[c])
  }
}
l$z <- z

plot_ly(x = l$x, y = l$y, z = l$z) %>% 
  add_surface() %>% 
  layout(scene = list(xaxis = list(title = "X"), 
                      yaxis = list(title = "Y"), 
                      zaxis = list(title = "Density")))

Finally, we can sample from the joint distribution utilizing rejection sampling and estimate the kernel density.

df <- projectq3a(n = 10000, jpdf = jdunif, a = 0, b = 1, C = 1)
kd <- with(df, kde2d(x, y, n = 100))
plot_ly(x = kd$x, y = kd$y, z = kd$z) %>% 
  add_surface() %>% 
  layout(scene = list(xaxis = list(title = "X"), 
                      yaxis = list(title = "Y"), 
                      zaxis = list(title = "Density")))

2D Beta on unit square

We first demo sampling from a beta distribution defined on the unit square. In order to do so, we use jdbeta(), an example joint probability density function from samplr which takes x and y as arguments for the two variables.

$$\text{pdf = }\begin{cases}\frac{x^{\alpha - 1}(1 - x)^{\beta - 1} + y^{\alpha - 1}(1 - y)^{\beta - 1}}{2B(\alpha, \beta)}\text{ for }0 \le x, y \le 1\0\text{ otherwise}\end{cases}$$

$$\text{where }B(\alpha, \beta) = \frac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha + \beta)}, \text{ with }\alpha, \beta \gt 0$$

Then we could examine what the distribution looks like in 3D by computing the probability densities for each x and y pair in a systematic sampling across the support.

l <- list(x = seq(0, 1, 0.01), 
          y = seq(0, 1, 0.01))

z <- matrix(rep(NA, length(l$x) * length(l$y)), 
            nrow = length(l$x))
for(r in 1:nrow(z)) {
  for(c in 1:ncol(z)) {
    z[r, c] <- jdbeta(x = l$x[r], y = l$y[c], shape1 = 2, shape2 = 2)
  }
}
l$z <- z

plot_ly(x = l$x, y = l$y, z = l$z) %>% 
  add_surface() %>% 
  layout(scene = list(xaxis = list(title = "X"), 
                      yaxis = list(title = "Y"), 
                      zaxis = list(title = "Density")))

Finally, we can sample from the joint distribution utilizing rejection sampling and estimate the kernel density.

df <- projectq3a(n = 10000, jpdf = jdbeta, a = 0, b = 1, C = 1.5, shape1 = 2, shape2 = 2)
kd <- with(df, kde2d(x, y, n = 100))
plot_ly(x = kd$x, y = kd$y, z = kd$z) %>% 
  add_surface() %>% 
  layout(scene = list(xaxis = list(title = "X"), 
                      yaxis = list(title = "Y"), 
                      zaxis = list(title = "Density")))

2D Custom on a square

We next demo sampling from a custom joint distribution defined on the square from -1 to +1. Again, we use an example joint probability density function from samplr, this time called jdcirclecontour(), which takes x and y as arguments for the two variables.

$$\text{pdf = }\begin{cases}\frac{3}{8}(x^2 + y^2) \text{for } x, y \in [-1, +1]\0 \text{ otherwise}\end{cases}$$

Then we could examine what the distribution looks like in 3D by computing the probability densities for each x and y pair in a systematic sampling across the support.

l <- list(x = seq(-1, 1, 0.1), 
          y = seq(-1, 1, 0.1))

z <- matrix(rep(NA, length(l$x) * length(l$y)), 
            nrow = length(l$x))
for(r in 1:nrow(z)) {
  for(c in 1:ncol(z)) {
    z[r, c] <- jdcirclecontour(x = l$x[r], y = l$y[c])
  }
}
l$z <- z

plot_ly(x = l$x, y = l$y, z = l$z) %>% 
  add_surface() %>% 
  layout(scene = list(xaxis = list(title = "X"), 
                      yaxis = list(title = "Y"), 
                      zaxis = list(title = "Density")))

Finally, we can sample from the joint distribution utilizing rejection sampling and estimate the kernel density.

df <- projectq3a(n = 10000, jpdf = jdcirclecontour, a = -1, b = 1, C = 0.75)
kd <- with(df, kde2d(x, y, n = 100))
plot_ly(x = kd$x, y = kd$y, z = kd$z) %>% 
  add_surface() %>% 
  layout(scene = list(xaxis = list(title = "X"), 
                      yaxis = list(title = "Y"), 
                      zaxis = list(title = "Density")))

Function projectq3b

The projectq3b function generates semi-random deviates from a continuous random variable. The user supplies the probability density function of the continuous random variable, and the function utilizing Markov chain Monte Carlo sampling to generate the deviates.

Uniform

Here is an example of using projectq3b to sample from the standard uniform distribution.

The uniform distribution has probability density function:

$$\text{pdf = }\begin{cases}\frac{1}{b - a} \text{for } x \in [a, b]\0 \text{ otherwise}\end{cases}$$

We first generate a bunch of deviates and then plot their density in teal Then the true density is overlayed as a yellow line.

df_uniform_samples <- 
  data.frame(u = projectq3b(n = 10000, 
                       pdf = dunif, 
                       a = 0, 
                       b = 1, 
                       min = 0, 
                       max = 1))

ggplot(df_uniform_samples, aes(x = u)) + 
  geom_density(color = NA, fill = "#35a09c") + 
  stat_function(aes(x = 0), 
                fun = dunif, 
                args = list(min = 0, max = 1), 
                color = "#fff735", 
                size = 1) + 
  labs(x = "X", 
       y = "Density") + 
  theme_minimal()

Beta

Here is an example of using projectq3b to sample from a beta distribution with both shape parameters equal to 2.

The beta distribution has probability density function:

$$\text{pdf = }\begin{cases}\frac{x^{\alpha - 1}(1 - x)^{\beta - 1}}{B(\alpha, \beta)} \text{ for }0 \le x \le 1\0 \text{ otherwise}\end{cases}$$

$$\text{where }B(\alpha, \beta) = \frac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha + \beta)}\text{ with }\alpha, \beta \gt 0$$

We first generate a bunch of deviates and then plot their density in teal. Then the true density is overlayed as a yellow line.

df_beta_samples <- 
  data.frame(b = projectq3b(n = 10000, 
                       pdf = dbeta, 
                       a = 0, 
                       b = 1, 
                       shape1 = 5, 
                       shape2 = 2))

ggplot(df_beta_samples, aes(x = b)) + 
  geom_density(color = NA, fill = "#35a09c") + 
  stat_function(aes(x = 0), 
                fun = dbeta, 
                args = list(shape1 = 5, shape2 = 2), 
                color = "#fff735", 
                size = 1) + 
  labs(x = "X", 
       y = "Density") + 
  theme_minimal()

Custom

Here is an example of using projectq3b to sample from a custom distribution.

$$\text{pdf = }\begin{cases}\frac{x}{2} \text{ for }0 \le x \le 2\0 \text{ otherwise}\end{cases}$$

dcustom <- function (x) {
  sapply(X = x, 
         FUN = function(x) { ifelse(0 < x && x < 2, x / 2, 0) })
}

We first generate a bunch of deviates and then plot their density in teal. Then the true density is overlayed as a yellow line.

df_custom_samples <- 
  data.frame(b = projectq3b(n = 10000, 
                       pdf = dcustom, 
                       a = 0, 
                       b = 2))

ggplot(df_custom_samples, aes(x = b)) + 
  geom_density(color = NA, fill = "#35a09c") + 
  stat_function(aes(x = 0), 
                fun = dcustom, 
                color = "#fff735", 
                size = 1) + 
  labs(x = "X", 
       y = "Density") + 
  theme_minimal()

Exponential

Markov chain Monte Carlo has an advantage over rejection sampling in that we do not have to limit ourselves to finite support such as that of uniform or beta.

Here is an example of using projectq3b to sample from an exponential distribution with rate = 2.

The exponential distribution has probability density function:

$$\text{pdf = }\begin{cases}\lambda e^{-\lambda x} \text{for } x \in [0, \infty)\0 \text{ otherwise}\end{cases}$$

We first generate a bunch of deviates and then plot their density in teal Then the true density is overlayed as a yellow line.

df_exponential_samples <- 
  data.frame(e = projectq3b(n = 10000, 
                       pdf = dexp, 
                       rate = 2))

ggplot(df_exponential_samples, aes(x = e)) + 
  geom_density(color = NA, fill = "#35a09c") + 
  stat_function(aes(x = 0), 
                fun = dexp, 
                args = list(rate = 2), 
                color = "#fff735", 
                size = 1) + 
  labs(x = "X", 
       y = "Density") + 
  theme_minimal()

Function projectq3c

The projectq3c function generates semi-random deviates from a bivariate continuous distribution. The user supplies the joint probability density function, and Markov chain Monte Carlo sampling is used to generate the deviates.

2D Uniform on unit square

We first demo sampling from a uniform distribution defined on the unit square. In order to do so, we use jdunif(), an example joint probability density function from samplr which takes x and y as arguments for the two variables.

$$\text{pdf = }\begin{cases}\frac{1}{(b - a)^2} \text{for } x, y \in [a, b]\0 \text{ otherwise}\end{cases}$$

Then we could examine what the distribution looks like in 3D by computing the probability densities for each x and y pair in a systematic sampling across the support.

l <- list(x = seq(0, 1, 0.5), 
          y = seq(0, 1, 0.5))

z <- matrix(rep(NA, length(l$x) * length(l$y)), 
            nrow = length(l$x))
for(r in 1:nrow(z)) {
  for(c in 1:ncol(z)) {
    z[r, c] <- jdunif(x = l$x[r], y = l$y[c])
  }
}
l$z <- z

plot_ly(x = l$x, y = l$y, z = l$z) %>% 
  add_surface() %>% 
  layout(scene = list(xaxis = list(title = "X"), 
                      yaxis = list(title = "Y"), 
                      zaxis = list(title = "Density")))

Finally, we can sample from the joint distribution utilizing Markov chain Monte Carlo sampling and estimate the kernel density.

df <- projectq3c(n = 10000, jpdf = jdunif)
kd <- with(df, kde2d(x, y, n = 100))
plot_ly(x = kd$x, y = kd$y, z = kd$z) %>% 
  add_surface() %>% 
  layout(scene = list(xaxis = list(title = "X"), 
                      yaxis = list(title = "Y"), 
                      zaxis = list(title = "Density")))

Custom

We next demo sampling from a custom joint distribution defined on the square from -1 to +1. Again, we use an example joint probability density function from samplr, this time called jdcirclecontour(), which takes x and y as arguments for the two variables.

$$\text{pdf = }\begin{cases}\frac{3}{8}(x^2 + y^2) \text{for } x, y \in [-1, +1]\0 \text{ otherwise}\end{cases}$$

Then we could examine what the distribution looks like in 3D by computing the probability densities for each x and y pair in a systematic sampling across the support.

l <- list(x = seq(-1, 1, 0.1), 
          y = seq(-1, 1, 0.1))

z <- matrix(rep(NA, length(l$x) * length(l$y)), 
            nrow = length(l$x))
for(r in 1:nrow(z)) {
  for(c in 1:ncol(z)) {
    z[r, c] <- jdcirclecontour(x = l$x[r], y = l$y[c])
  }
}
l$z <- z

plot_ly(x = l$x, y = l$y, z = l$z) %>% 
  add_surface() %>% 
  layout(scene = list(xaxis = list(title = "X"), 
                      yaxis = list(title = "Y"), 
                      zaxis = list(title = "Density")))

Finally, we can sample from the joint distribution utilizing Markov chain Monte Carlo sampling and estimate the kernel density.

df <- projectq3c(n = 10000, jpdf = jdcirclecontour)
kd <- with(df, kde2d(x, y, n = 100))
plot_ly(x = kd$x, y = kd$y, z = kd$z) %>% 
  add_surface() %>% 
  layout(scene = list(xaxis = list(title = "X"), 
                      yaxis = list(title = "Y"), 
                      zaxis = list(title = "Density")))

Bivariate Normal

The most common bivariate distribution is probably the bivariate normal with probability density function:

$$\text{f(x, y) = }\frac{1}{2\pi\sigma_1\sigma_2\sqrt{1-\rho^2}}\text{ exp }\bigg[-\frac{z}{2(1-\rho^2)}\bigg] \text{ for }x, y \in (-\infty, +\infty)$$

where

$$z \equiv \frac{(x_1 - \mu_1)^2}{\sigma_1^2} - \frac{2\rho(x_1 - \mu_1)(x_2 - \mu_2)}{\sigma_1\sigma_2} + \frac{(x_2 - \mu_2)^2}{\sigma_2^2}$$

and

$$\rho \equiv cor(x_1, x_2) = \frac{V_{12}}{\sigma_1\sigma_2}$$

l <- list(x = seq(-3, 3, 0.1), 
          y = seq(-3, 3, 0.1))

z <- matrix(rep(NA, length(l$x) * length(l$y)), 
            nrow = length(l$x))
for(r in 1:nrow(z)) {
  for(c in 1:ncol(z)) {
    z[r, c] <- jdnorm(x = l$x[r], y = l$y[c])
  }
}
l$z <- z

plot_ly(x = l$x, y = l$y, z = l$z) %>% 
  add_surface() %>% 
  layout(scene = list(xaxis = list(title = "X"), 
                      yaxis = list(title = "Y"), 
                      zaxis = list(title = "Density")))

Finally, we can sample from the joint distribution utilizing Markov chain Monte Carlo sampling and estimate the kernel density.

df <- projectq3c(n = 10000, jpdf = jdnorm)
kd <- with(df, kde2d(x, y, n = 100))
plot_ly(x = kd$x, y = kd$y, z = kd$z) %>% 
  add_surface() %>% 
  layout(scene = list(xaxis = list(title = "X"), 
                      yaxis = list(title = "Y"), 
                      zaxis = list(title = "Density")))


schuelkem/samplr documentation built on May 6, 2019, 7:19 a.m.