star_pred_dist: Compute a predictive distribution for the integer-valued...

View source: R/source_EM.R

star_pred_distR Documentation

Compute a predictive distribution for the integer-valued response

Description

A Monte Carlo approach for estimating the (plug-in) predictive distribution for the STAR linear model. The algorithm iteratively samples (i) the latent data given the observed data, (ii) the latent predictive data given the latent data from (i), and (iii) (inverse) transforms and rounds the latent predictive data to obtain a draw from the integer-valued predictive distribution.

Usage

star_pred_dist(
  y,
  X,
  X.test = NULL,
  transformation = "np",
  y_max = Inf,
  sd_init = 10,
  tol = 10^-10,
  max_iters = 1000,
  N = 1000
)

Arguments

y

n x 1 vector of observed counts

X

n x p matrix of predictors

X.test

m x p matrix of out-of-sample predictors

transformation

transformation to use for the latent data; must be one of

  • "identity" (identity transformation)

  • "log" (log transformation)

  • "sqrt" (square root transformation)

  • "np" (nonparametric transformation estimated from empirical CDF)

  • "pois" (transformation for moment-matched marginal Poisson CDF)

  • "neg-bin" (transformation for moment-matched marginal Negative Binomial CDF)

  • "box-cox" (box-cox transformation with learned parameter)

y_max

a fixed and known upper bound for all observations; default is Inf

sd_init

add random noise for initialization scaled by sd_init times the Gaussian MLE standard deviation

tol

tolerance for stopping the EM algorithm; default is 10^-10;

max_iters

maximum number of EM iterations before stopping; default is 1000

N

number of Monte Carlo samples from the posterior predictive distribution

Value

N x m samples from the posterior predictive distribution at the m test points

Note

The “plug-in" predictive distribution is a crude approximation. Better approaches are available using the Bayesian models, which provide samples from the posterior predictive distribution.

Examples

# Simulate data with count-valued response y:
x = seq(0, 1, length.out = 100)
y = rpois(n = length(x), lambda = exp(1.5 + 5*(x -.5)^2))

# Assume a quadratic effect (better for illustration purposes):
X = cbind(1,x, x^2)

# Compute the predictive draws for the test points (same as observed points here)
y_pred = star_pred_dist(y, X, transformation = 'sqrt')

# Using these draws, compute prediction intervals for STAR:
PI_y = t(apply(y_pred, 2, quantile, c(0.05, 1 - 0.05)))

# Plot the results: PIs and CIs
plot(x, y, ylim = range(y, PI_y), main = 'STAR: 90% Prediction Intervals')
lines(x, PI_y[,1], col='darkgray', type='s', lwd=4);
lines(x, PI_y[,2], col='darkgray', type='s', lwd=4)

drkowal/rSTAR documentation built on July 5, 2023, 2:18 p.m.