gold: Bayesian Density Estimation Using 'gold' (Gaussian process On...
In reykp/BEDr: What the Package Does in One 'Title Case' Line

Description Usage Arguments Details Value References Examples

Estimates a density from Bayesian techniques using a Gaussian process on a log density.

1	gold(y, s1, c1, s2, c2, N = 20000, graves = FALSE, scale_l = 0.00001, scale_u = 0.00001, poi = NA)

`y`	A numeric vector. `gold` estimates the density on this data.
`s1`	The standard deviation of the Gaussian prior.
`c1`	The correlation parameter of the Gaussian prior that controls the correlation in the covariance structure and smoothness in the density estimate.
`s2`	The standard deviation of the Gaussian proposal distribution.
`c2`	The correlation parameter that controls the covariance structure of the Gaussian proposal distribution.
`N`	The number of iterations to run in the algorithm. The default is 20,000.
`graves`	An option to have the standard deviation of the proposal distribution (s1) chosen using an automatic step size selection (See Graves (2011)) (DEFAULT is FALSE).
`scale_l`	A value >= 0 controlling the scaling based on the minimum data value. The default is 0.00001 (see details).
`scale_u`	value >= 0 controlling the scaling based on the maximum data value. The default is 0.00001 (see details).
`poi`	Points of interest at which to estimate the density. `gold_pdf` will produce estimates and credible intervals of the density at any point, but `mat` returned by `gold_pdf` and `gold_cdf` only return the posterior draws if the density is estimated at that point.

The density being estimated takes the form below:

f(x) =

exp(g(x)) / (\int_{0}^{1} exp(g(u)) du)

where g(x) is an unknown log density.

Given that g(x) is unknown, the integral in the normalizing constant is estimated using a weighted average, and the set of unknown parameters that receive a prior is g(x) at a finite set of points. This finite set of points includes each data point, as well as a grid to ensure good coverage.

This density operates on data between 0 and 1. Thus, if the input is not between 0 and 1, it is standardized. The formula for scaling is below:

(y-(min(y)-scale_l))/(max(y)+scale_u-(min(y)-scale_l))

Values of 0 for the scale parameters indicates the density will only be the edges of the data in y. The default for scale_l and scale_u is 0.00001 since some densities cannot be estimated at 0 and 1.

gold returns a list containing the density estimate results.

`G`	A matrix with the posterior draws for g(x) at a finite number of points. The number of rows is the number of iterations. The number of columns is the number of data points plus the grid points.
`widths`	A vector with the widths around each x used for g(x). These are the widths used in the weighted average to estimate the integral in the normalizing constant.
`x`	A vector with the points at which g(x) was estimated.
`y`	A vector containing the data introduced to `gold` for density estimation.
`ar`	The acceptance rate from the random walk proposals.
`prior`	A vector of all prior and proposal parameter values.
`poi`	The points of interest if `poi` is non-missing.

Graves, T. (2011). Automatic Step Size Selection in Random Walk Metropolis Algorithms.

## --------------------------------------------------------------------------------
## Beta Distribution
## --------------------------------------------------------------------------------

# First run 'gold' on data sampled from a Beta(5, 1) distribution with 200 data points.
y <- rbeta(200, 5, 1)
# Specify all prior parameters and tune s2 until a reasonable acceptance rate is achieved
gold_beta <- gold(y = y, s1 = 1, c1 = 2, s2 = 0.2, c2 = 0.5, N = 20000)

# Check trace plots
gold_mcmcplots(gold_beta)

# Examine the estimate of the PDF
gold_plot(gold_beta, data = TRUE)

# Examine the estimate of the CDF with the empirical CDF overlaied
gold_plot(gold_beta, type = "cdf", data = TRUE)

## --------------------------------------------------------------------------------
## Normal Distribution
## --------------------------------------------------------------------------------

# First run 'gold' on data sampled from a Normal(0, 1) distribution with 100 data points.
y <- rnorm(100, 0, 1)
# Specify all prior parameters except s2. Use graves instead to choose s2 to achieve a reasonable acceptance rate
# Set the upper and lower scaling parameters to the standard deviation of the data
gold_norm <- gold(y = y, s1 = 1, c1 = 1, c2 = 0.5, N = 20000, graves = TRUE, scale_l = sd(y), scale_u = sd(y))

# Check what s2 was chosen
gold_norm$prior

# Check acf plots
gold_mcmcplots(gold_norm, type = "acf")

# Examine the estimate of the PDF with credible intervals
gold_plot(gold_norm, cri = TRUE)

# Examine the estimate of the CDF using an interactive graph
gold_plot(gold_norm, type = "cdf", interact = TRUE)

## --------------------------------------------------------------------------------
## Bimodal Distribution
## --------------------------------------------------------------------------------

# Sample 400 random uniforms
u  <- runif(400)
y <- rep(NA,400)
# Sampling from the mixture
for(i in 1:400){
  if(u[i] < 0.3){
   y[i]  <-  rnorm(1, 0, 1)
  }else {
   y[i] <- rnorm(1, 4, 1)
  }
}

# First run 'gold' on data sampled from a bimodal distribution with 200 data points.
# Specify several points of interest
gold_bimodal <- gold(y = y, s1 = 1, c1 = 15, c2 = 7, graves = TRUE, poi = c(0, 6.5))

# Check the running mean plots
gold_mcmcplots(gold_bimodal, type = "rm")

# Plot the PDF with the data and credible intervals
gold_plot(gold_bimodal, data = TRUE, cri = TRUE)

# Plot the CDF
gold_plot(gold_bimodal, type = "cdf")

# Plot histograms at estimates of the CDF at the 'poi'
bimodal_cdf <- gold_cdf(x = c(0, 6.5), gold_bimodal)
bimodal_cdf$cdf
# x = 0
hist(bimodal_cdf$mat[,1])
# x = 6.5
hist(bimodal_cdf$mat[,2])