fast_spike_slab_beta: Compute marginal posterior estimates for beta-spike-and-slab...

View source: R/SequenceSpikeSlab.R

fast_spike_slab_betaR Documentation

Compute marginal posterior estimates for beta-spike-and-slab prior

Description

Computes marginal posterior probabilities (slab probabilities) that data points have non-zero mean for the spike-and-slab prior with a Beta(beta_kappa,beta_lambda) prior on the mixing parameter. The posterior mean is also provided.

Usage

fast_spike_slab_beta(
  x,
  sigma = 1,
  m = 20,
  slab = "Laplace",
  Laplace_lambda = 0.5,
  Cauchy_gamma = 1,
  beta_kappa = 1,
  beta_lambda,
  show_progress = TRUE
)

Arguments

x

Vector of n data points

sigma

Standard deviation of the Gaussian noise in the data. May also be set to "auto", in which case sigma is estimated using the estimateSigma function from the selectiveInference package

m

The number of discretization points used is proportional to m*sqrt(n). The larger m, the better the approximation, but the runtime also increases linearly with m. The default m=20 usually gives sufficient numerical precision.

slab

Slab distribution. Must be either "Laplace" or "Cauchy".

Laplace_lambda

Parameter of the Laplace slab

Cauchy_gamma

Parameter of the Cauchy slab

beta_kappa

Parameter of the beta-distribution

beta_lambda

Parameter of the beta-distribution. Default value=n+1

show_progress

Boolean that indicates whether to show a progress bar

Details

The run-time is O(m*n^(3/2)) on n data points, which means that doubling the size of the data leads to an increase in computation time by approximately a factor of 2*sqrt(2)=2.8. Data sets of size n=100,000 should be feasible within approximately 30 minutes.

Value

list (postprobs, postmean, sigma), where postprobs is a vector of marginal posterior slab probabilities that x[i] has non-zero mean for i=1,...,n; postmean is a vector with the posterior mean for the x[i]; and sigma is the value of sigma (this may be of interest when the sigma="auto" option is used)

Examples

# Illustrate that fast_spike_slab_beta is a faster way to compute the same results as
# general_sequence_model on the beta-binomial prior

# Generate data
n <- 500          # sample size
n_signal <- 25    # number of non-zero theta
A <- 5            # signal strength
theta <- c(rep(A,n_signal), rep(0,n-n_signal))
x <- theta + rnorm(n, sd=1)

# Choose slab
slab <- "Cauchy"
Cauchy_gamma <- 1

cat("Running fast_spike_slab_beta (fast for very large n)...\n")
res_fss <- fast_spike_slab_beta(x, sigma=1, slab=slab, Cauchy_gamma=Cauchy_gamma)

cat("Running general_sequence_model (slower for very large n)...\n")
res_gsm <- general_sequence_model(x, sigma=1, slab=slab,
                                  log_prior="beta-binomial", Cauchy_gamma=Cauchy_gamma)

cat("Maximum difference in marginal posterior slab probabilities:",
    max(abs(res_gsm$postprobs - res_fss$postprobs)))
cat("\nMaximum difference in posterior means:",
    max(abs(res_gsm$postmean - res_fss$postmean)), "\n")

# Plot means
M=max(abs(x))+1
plot(1:n, x, pch=20, ylim=c(-M,M), col='green', xlab="", ylab="",
     main="Posterior Means (Same for Both Methods)")
points(1:n, theta, pch=20, col='blue')
points(1:n, res_gsm$postmean, pch=20, col='black', cex=0.6)
points(1:n, res_fss$postmean, pch=20, col='magenta', cex=0.6)
legend("topright", legend=c("general_sequence_model", "fast_spike_slab_beta",
                            "data", "truth"),
       col=c("black", "magenta", "green", "blue"), pch=20, cex=0.7)

SequenceSpikeSlab documentation built on Sept. 8, 2023, 6:06 p.m.