lm.spike: Spike and slab regression

View source: R/lm.spike.R

lm.spikeR Documentation

Spike and slab regression

Description

MCMC algorithm for linear regression models with a 'spike-and-slab' prior that places some amount of posterior probability at zero for a subset of the regression coefficients.

The model admits either Gaussian or student T errors; the latter are useful in the presence of outliers.

Usage

lm.spike(formula,
         niter,
         data,
         subset,
         prior = NULL,
         error.distribution = c("gaussian", "student"),
         contrasts = NULL,
         drop.unused.levels = TRUE,
         model.options = SsvsOptions(),
         ping = niter / 10,
         seed = NULL,
         ...)

SsvsOptions(adaptive.cutoff = 100,
            adaptive.step.size = .001,
            target.acceptance.rate = .345,
            correlation.swap.threshold = .8)

OdaOptions(fallback.probability = 0.0,
           eigenvalue.fudge.factor = 0.01)

Arguments

formula

formula for the maximal model (with all variables included), this is parsed the same way as a call to lm.

niter

The number of MCMC iterations to run. Be sure to include enough so you can throw away a burn-in set.

data

An optional data frame, list or environment (or object coercible by 'as.data.frame' to a data frame) containing the variables in the model. If not found in 'data', the variables are taken from 'environment(formula)', typically the environment from which 'lm.spike' is called.

subset

An optional vector specifying a subset of observations to be used in the fitting process.

prior

An optional list returned by SpikeSlabPrior. If prior is missing then a default prior will be used. See SpikeSlabPrior.

error.distribution

Specify either Gaussian or Student T errors. If the error distribution is student then the prior must be a StudentSpikeSlabPrior and the SSVS method must be used.

contrasts

An optional list. See the contrasts.arg argument of model.matrix.default.

drop.unused.levels

Logical indicating whether unobserved factor levels should be dropped from the model.

model.options

A list containing the tuning parameters for the desired MCMC method.

ping

The frequency with which to print status update messages to the screen. For example, if ping == 10 then an update will be printed every 10 MCMC iterations.

seed

An integer to use as the random seed for the underlying C++ code. If NULL then the seed will be set using the clock.

...

Extra arguments to be passed to SpikeSlabPrior (if method == "SSVS") or IndependentSpikeSlabPrior (if method == "ODA").

fallback.probability

When using the ODA method, each MCMC iteration will use SSVS instead of ODA with this probability. In cases where the latent data have high leverage, ODA mixing can suffer. Mixing in a few SSVS steps can help keep an errant algorithm on track.

eigenvalue.fudge.factor

When using the ODA method, the latent X's will be chosen so that the complete data X'X matrix (after scaling) is a constant diagonal matrix equal to the largest eigenvalue of the observed (scaled) X'X times (1 + eigenvalue.fudge.factor). This should be a small positive number.

adaptive.cutoff

The traditional SSVS method (sample every predictor at every iteration) will be used when there are fewer than this many predictors. The adaptive method of Benson and Fried will be used if there are more.

adaptive.step.size

The step size scaling factor to use in the adaptive SSVS algorithm.

target.acceptance.rate

The target acceptance rate for the adaptive SSVS algorithm.

correlation.swap.threshold

The minimal absolute correlation required for two variables to be considered for a swap move. Swap moves are currently only supported for less than adaptive.cutoff variables.

Details

There are two MCMC methods available. SSVS is the stochastic search variable selection algorithm from George and McCulloch (1998). ODA is the orthogonal data augmentation method from Clyde and Ghosh (2011). Both sampling methods ("ODA" and "SSVS") draw each variable inclusion indicator given all others, in a Gibbs sampler. The ODA method includes an extra data augmentation step that renders each indicator conditionally independent of the others given the latent data. There is residual dependence between successive MCMC steps introduced by the latent data, but the paper by Ghosh and Clyde suggested that on balance mixing should be improved.

SSVS offers a choice between to implementations. Classic SSVS attempts to flip each coefficient in or out of the model every iteration. The adaptive method attempts to learn which coefficients are likely to be included or excluded. It then biases its 'birth' and 'death' moves towards candidates that are likely to succeed.

Regarding the overall compute time, the DA method decomposes the (potentially very large) model matrix one time, at the start of the algorithm. But it then works with independent scalar updates. The SSVS algorithm does not have the upfront cost, but it works with many small matrix decompositions each MCMC iteration. The DA algorithm is very likely to be faster in terms of time per iteration.

Finally, note that the two algorithms require slightly different priors. The DA algorithm requires a priori independence, while the SSVS algorithm can work with arbitrary conjugate priors.

Value

Returns an object of class lm.spike, which is a list with the following elements

beta

A niter by ncol(x) matrix of regression coefficients, many of which may be zero. Each row corresponds to an MCMC iteration.

sigma

A vector of length niter containing the MCMC draws of the residual standard deviation parameter.

prior

The prior used to fit the model. If a prior was supplied as an argument it will be returned. Otherwise this will be the automatically generated prior based on the other function arguments.

Author(s)

Steven L. Scott

References

George and McCulloch (1997), "Approaches to Bayesian Variable Selection", Statistica Sinica, 7, 339 – 373. https://www3.stat.sinica.edu.tw/statistica/oldpdf/A7n26.pdf

Ghosh and Clyde (2011) "Rao-Blackwellization for Bayesian variable selection and model averaging in linear and binary regression: A novel data augmentation approach", Journal of the American Statistical Association, 106 1041-1052. https://homepage.stat.uiowa.edu/~jghsh/ghosh_clyde_2011_jasa.pdf

See Also

SpikeSlabPrior, plot.lm.spike, summary.lm.spike, predict.lm.spike.

Examples

  n <- 100
  p <- 10
  ngood <- 3
  niter <- 1000
  sigma <- .8

  x <- cbind(1, matrix(rnorm(n * (p-1)), nrow=n))
  beta <- c(rnorm(ngood), rep(0, p - ngood))
  y <- rnorm(n, x %*% beta, sigma)
  x <- x[,-1]
  model <- lm.spike(y ~ x, niter=niter)
  plot.ts(model$beta)
  hist(model$sigma)  ## should be near 8
  plot(model)
  summary(model)
  plot(model, "residuals")

  ## Now replace the first observation with a big outlier.
  y[1] <- 50
  model <- lm.spike(y ~ x, niter = niter)
  model2 <- lm.spike(y ~ x, niter = niter, error.distribution = "student")
  pred <- predict(model, newdata = x)
  pred2 <- predict(model2, newdata = x)

  ## Maximize the plot window before making these box plots.  They show
  ## the posterior predictive distribution of all 100 data points, so
  ## make sure your screen is 100 boxes wide!
  par(mfrow = c(2,1))
  BoxplotTrue(t(pred), truth = y, ylim = range(pred), pch = ".",
     main = "Posterior predictive distribution assuming Gaussian errors.")
  BoxplotTrue(t(pred2), truth = y, ylim  = range(pred), pch = ",",
     main = "Posterior predictive distribution assuming Student errors.")

  ## The posterior predictive distributions are much tighter in the
  ## student case than in the Gaussian case, even though the student
  ## model has heavier tails, because the "sigma" parameter is smaller.
  par(mfrow = c(1,1))
  CompareDensities(list(gaussian = model$sigma, student = model2$sigma),
                        xlab = "sigma")

BoomSpikeSlab documentation built on May 29, 2024, 5:07 a.m.