Sample Rejection Method Overview
Rejection sampling is one method of producing a random sample from along a single dimension. Rejection sampling works as follows: First, two uniform sample sets are produced, paired, and plotted. In this case, pairs of samples are produced and plotted 1000 times. Then, of the 1000 points, only points under the pdf curve are taken as part of the sample. Any point above the curve is rejected from the sample. Thus, an X value is proportionally more likely to be accepted as part of the sample if it falls underneath a higher part of the pdf. The non-rejected points are the product of THEFUNCTION1D.
Perhaps it is easiest to understand visually. First, 1000 uniformly distributed coordinate pairs are generated. Then, given the pdf exp(-x) and bounds 0 and 1, only about 650 points fall underneath the pdf curve. These points are highlighted in black. The other 350 are rejected (gray).
n <- 1000 x <- runif(n,0,1) y <- runif(n,0,1) points <- cbind(x,y) names(points) <- c("x", "y") dark_sample <- data.frame() light_sample <- data.frame() pdf <- function(x) {exp(-x)} for(i in 1:n) { dark <- points[i,2] < pdf(points[i,1]) if(dark) dark_sample <- rbind(dark_sample, points[i,]) else light_sample <- rbind(light_sample, points[i,]) } print(nrow(dark_sample)) print(nrow(light_sample)) names(dark_sample) <- c("X","Y") names(light_sample) <- c("X", "Y") library(ggplot2) ggplot(data = dark_sample, aes(X, Y)) + geom_point() + geom_point(data = light_sample, aes(X,Y), color = "gray") + stat_function(fun = pdf)
Sample Rejection Function Parameters
Inputs to THEFUNCTION1D include
*n - The desired number of samples.
*pdf - A valid pdf.
*lower_bound - The least finite value that the random variable may take.
*upper_bound - The greatest finite value that the random variable may take.
*C - Any real number greater than every point on the pdf.
These values are highlighted on the graph below. This time, however, the pdf is not exponential. Rather, it is approximately normal (normal, but with finite tails).
THEFUNCTION1D <- function(n = 1, pdf, lower_bound = 0, upper_bound = 1, C = 1) { if(n == "Dr. Speegle is the best!") stop("You bet he is!") if((n >= 1) == FALSE) stop("n must be positive, like a good attitude.") if(is.numeric(lower_bound) == FALSE) stop("Watch out! You must enter a numeric value for the lower_bound.") if(is.numeric(upper_bound) == FALSE) stop("Watch out! You must enter a numeric value for the upper_bound.") if(is.numeric(C) == FALSE) stop("Be careful. You must enter a numeric value for C. Greater values of C will slow the processing time.") if(lower_bound > upper_bound) stop("That's an error. The lower_bound is greater than the upper_bound.") n <- ceiling(n) sample_function <- function(pdf, lower_bound, upper_bound, C) { sample <- c() while(length(sample) != 1) { x <- runif(1,lower_bound, upper_bound) y <- runif(1, 0, C) success <- y < pdf(x) if(success) { return(x) sample <- append(sample, x) } } } replicate(n, sample_function(pdf, lower_bound, upper_bound, C)) } pdf <- function(x) {dnorm(x,0,1)} n <- 1000 c <- 0.5 x <- runif(n,-5,5) y <- runif(n,0,c) points <- cbind(x, y) dark_sample <- data.frame() light_sample <- data.frame() for(i in 1:n) { dark <- points[i,2] < pdf(points[i,1]) if(dark) dark_sample <- rbind(dark_sample, points[i,]) else light_sample <- rbind(light_sample, points[i,]) } names(dark_sample) <- c("X","Y") names(light_sample) <- c("light_sample_X", "light_sample_Y") C <- function(c) {0.5} library(ggplot2) print("THEFUNCTION1D(pdf, n = 1000, lower_bound = -1, upper_bound = 5, C = 0.5") ggplot(data = dark_sample, aes(X, Y)) + geom_point() + geom_point(data = light_sample, aes(light_sample_X,light_sample_Y), color = "gray") + stat_function(fun = pdf) + geom_vline(xintercept = -5, color = "orange") + geom_vline(xintercept = 5, color = "red") + stat_function(fun = C, color ="green") + ggtitle("THEFUNCTION1D(pdf, n = 1000, lower_bound = -1, upper_bound = 5, C = 0.5")
Example #1: Standard Normal Distribution
pdf <- function(x) {dnorm(x,0,1)} sample <- THEFUNCTION1D(n = 100000, pdf, lower_bound = -5, upper_bound = 5, C = 0.4) head(sample) hist(sample)
Example #2: Uniform distribution(a=-10, b=10)
pdf <- function(x) {dunif(x,-10,10)} sample <- THEFUNCTION1D(n = 200, pdf, lower_bound = -10, upper_bound = 10, C = 0.06) hist(sample)
Example #3: Exponential(lambda = 7)
pdf <- function(x) {dexp(x,7)} sample <- THEFUNCTION1D(n = 60, pdf, lower_bound = 0, upper_bound = 100, C = 50) hist(sample)
Example #4: Gamma(alpha = 12, beta = 0.6)
pdf <- function(x) {dgamma(x, 12, 0.6)} sample <- data.frame(THEFUNCTION1D(n = 10000, pdf, lower_bound = 0, upper_bound = 50, C = 0.5)) ggplot(sample, aes(sample[,1])) + geom_histogram(binwidth = 0.05)
Example #5: Beta(alpha = 6, beta = 8)
pdf <- function(x) {dbeta(x,6,8)} sample <- THEFUNCTION1D(n = 50000, pdf, lower_bound = 0, upper_bound = 1, C = 5) hist(sample)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.