STAR_sparse_means | R Documentation |
Compute Gibbs samples from the posterior distribution of the inclusion indicators for the sparse means model. The inclusion probability is assigned a Beta(a_pi, b_pi) prior and is learned as well.
STAR_sparse_means(
y,
transformation = "identity",
y_min = -Inf,
y_max = Inf,
psi = NULL,
a_pi = 1,
b_pi = 1,
approx_Fz = FALSE,
approx_Fy = FALSE,
nsave = 1000,
nburn = 1000,
nskip = 0,
verbose = TRUE
)
y |
|
transformation |
transformation to use for the latent data; must be one of
|
y_min |
a fixed and known upper bound for all observations; default is |
y_max |
a fixed and known upper bound for all observations; default is |
psi |
prior variance for the slab component; if NULL, assume a Unif(0, n) prior |
a_pi |
prior shape1 parameter for the inclusion probability; default is 1 for uniform |
b_pi |
prior shape2 parameter for the inclusion probability; #' default is 1 for uniform |
approx_Fz |
logical; in BNP transformation, apply a (fast and stable) normal approximation for the marginal CDF of the latent data |
approx_Fy |
logical; in BNP transformation, approximate
the marginal CDF of |
nsave |
number of MCMC iterations to save |
nburn |
number of MCMC iterations to discard |
nskip |
number of MCMC iterations to skip between saving iterations, i.e., save every (nskip + 1)th draw |
verbose |
logical; if TRUE, print time remaining |
STAR defines an integer-valued probability model by (1) specifying a Gaussian model for continuous *latent* data and (2) connecting the latent data to the observed data via a *transformation and rounding* operation. Here, the continuous latent data model is a sparse normal means model of the form z_i = theta_i + epsilon_i with a spike-and-slab prior on theta_i.
There are several options for the transformation. First, the transformation
can belong to the signed *Box-Cox* family, which includes the known transformations
'identity' and 'sqrt'. Second, the transformation
can be estimated (before model fitting) using the empirical distribution of the
data y
. Options in this case include the empirical cumulative
distribution function (CDF), which is fully nonparametric ('np'), or the parametric
alternatives based on Poisson ('pois') or Negative-Binomial ('neg-bin')
distributions. For the parametric distributions, the parameters of the distribution
are estimated using moments (means and variances) of y
. The distribution-based
transformations approximately preserve the mean and variance of the count data y
on the latent data scale, which lends interpretability to the model parameters.
Lastly, the transformation can be modeled using the Bayesian bootstrap ('bnp'),
which is a Bayesian nonparametric model and incorporates the uncertainty
about the transformation into posterior and predictive inference.
There are several options for the prior variance psi
.
First, it can be specified directly. Second, it can be assigned
a Uniform(0,n) prior and sampled within the MCMC.
a list with the following elements:
post_gamma
: nsave x n
samples from the posterior distribution
of the inclusion indicators
post_pi
: nsave
samples from the posterior distribution
of the inclusion probability
post_psi
: nsave
samples from the posterior distribution
of the prior variance
post_theta
: nsave
samples from the posterior distribution
of the regression coefficients
post_g
: nsave
posterior samples of the transformation
evaluated at the unique y
values (only applies for 'bnp' transformations)
# Simulate some data:
y = round(c(rnorm(n = 100, mean = 0),
rnorm(n = 100, mean = 2)))
# Fit the model:
fit = STAR_sparse_means(y, nsave = 100, nburn = 100) # for a quick example
names(fit)
# Posterior inclusion probabilities:
pip = colMeans(fit$post_gamma)
plot(pip, y)
# Check the MCMC efficiency:
getEffSize(fit$post_theta) # coefficients
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.