posterior_predict: Sample from the posterior predictive distribution
In geostan: Bayesian Spatial Analysis

posterior_predict

R Documentation

Sample from the posterior predictive distribution

Description

Draw samples from the posterior predictive distribution of a fitted geostan model.

Usage

posterior_predict(
  object,
  S,
  summary = FALSE,
  width = 0.95,
  approx = TRUE,
  K = 20,
  preserve_order = FALSE,
  seed
)

Arguments

`object`	A `geostan_fit` object.
`S`	Optional; number of samples to take from the posterior distribution. The default, and maximum, is the total number of samples stored in the model.
`summary`	Should the predictive distribution be summarized by its means and central quantile intervals? If `summary = FALSE`, an `S` x `N` matrix of samples will be returned. If `summary = TRUE`, then a `data.frame` with the means and `100*width` credible intervals is returned.
`width`	Only used if `summary = TRUE`, to set the quantiles for the credible intervals. Defaults to `width = 0.95`.
`approx`	For SAR models only; `approx = TRUE` uses an approximation method for the inverse of matrix `(I - rho * W)`.
`K`	For SAR models only; number of matrix powers to for the matrix inverse approximation (used when `approx = TRUE`). High values of rho (especially > 0.9) require larger K for accurate approximation.
`preserve_order`	If `TRUE`, the order of posterior draws will remain fixed; the default is to permute the MCMC samples so that (with small sample size `S`) each successive call to `posterior_predict` will return a different sample from the posterior probability distribution.
`seed`	A single integer value to be used in a call to `set.seed` before taking samples from the posterior distribution.

Details

This method returns samples from the posterior predictive distribution of the model (at the observed values of covariates, etc.). The predictions incorporate uncertainty of all parameter values (used to calculate the expected value of the model, for example) plus the error term (the model's description of the amount of variability of observations around the expected value). If the model includes measurement error in the covariates, this source of uncertainty (about X) is passed into the posterior predictive distribution as well.

For SAR models (and all other models), the observed outcomes are not used to formulate the posterior predictive distribution. The posterior predictive distribution for the SLM (see stan_sar) is given by

(I - \rho W)^{-1} (\mu + \epsilon).

The SDLM is the same but includes spatially-lagged covariates in mu. The approx = FALSE method for SAR models requires a call to Matrix::solve(I - rho * W) for each MCMC sample; the approx = TRUE method uses an approximation based on matrix powers (LeSage and Pace 2009). The approximation will deteriorate if \rho^K is not near zero, so use with care.

Value

A matrix of size S x N containing samples from the posterior predictive distribution, where S is the number of samples drawn and N is the number of observations. If summary = TRUE, a data.frame with N rows and 3 columns is returned (with column names mu, lwr, and upr).

Source

LeSage, James, & Robert kelley Pace (2009). Introduction to Spatial Econometrics. Chapman and Hall/CRC.

Gelman, A., J. B.Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, & D. B. Rubin, D. B. (2014). Bayesian data analysis (3rd ed.). CRC Press.

McElreath, Richard (2016). Statistical Rethinking: A Bayesian Course with Examples in R and Stan. CRC Press, Ch. 3.

Examples

E <- sentencing$expected_sents
sentencing$log_E <- log(E)
 fit <- stan_glm(sents ~ offset(log_E),
                  re = ~ name,
                  data = sentencing,
                  family = poisson(),
                  chains = 2, iter = 600) # for speed only


 yrep <- posterior_predict(fit, S = 65)
 plot(density(yrep[1,] / E ))
 for (i in 2:nrow(yrep)) lines(density(yrep[i,] / E), col = 'gray30')
 lines(density(sentencing$sents / E), col = 'darkred', lwd = 2)

sars <- prep_sar_data2(row = 9, col = 9)
W <- sars$W
y <- sim_sar(rho = .9, w = W)
fit <- stan_sar(y ~ 1, data = data.frame(y=y), sar = sars,
                iter = 650, quiet = TRUE)
yrep <- posterior_predict(fit, S = 15)

geostan documentation built on April 3, 2025, 10:04 p.m.