blasso: Bayesian Lasso/NG, Horseshoe, and Ridge Regression
In monomvn: Estimation for MVN and Student-t Data with Monotone Missingness

blasso

R Documentation

Bayesian Lasso/NG, Horseshoe, and Ridge Regression

Description

Inference for ordinary least squares, lasso/NG, horseshoe and ridge regression models by (Gibbs) sampling from the Bayesian posterior distribution, augmented with Reversible Jump for model selection

Usage

bhs(X, y, T=1000, thin=NULL, RJ=TRUE, M=NULL, beta=NULL,
         lambda2=1, s2=var(y-mean(y)), mprior=0, ab=NULL,
         theta=0, rao.s2=TRUE, icept=TRUE, normalize=TRUE, verb=1)
bridge(X, y, T = 1000, thin = NULL, RJ = TRUE, M = NULL,
       beta = NULL, lambda2 = 1, s2 = var(y-mean(y)), mprior = 0,
       rd = NULL, ab = NULL, theta=0, rao.s2 = TRUE, icept = TRUE,
       normalize = TRUE, verb = 1)
blasso(X, y, T = 1000, thin = NULL, RJ = TRUE, M = NULL,
       beta = NULL, lambda2 = 1, s2 = var(y-mean(y)),
       case = c("default", "ridge", "hs", "ng"), mprior = 0, rd = NULL,
       ab = NULL, theta = 0, rao.s2 = TRUE, icept = TRUE, 
       normalize = TRUE, verb = 1)

Arguments

`X`	`data.frame`, `matrix`, or vector of inputs `X`
`y`	vector of output responses `y` of length equal to the leading dimension (rows) of `X`, i.e., `length(y) == nrow(X)`
`T`	total number of MCMC samples to be collected
`thin`	number of MCMC samples to skip before a sample is collected (via thinning). If `NULL` (default), then `thin` is determined based on the regression model implied by `RJ`, `lambda2`, and `ncol(X)`; and also on the errors model implied by `theta` and `nrow(X)`
`RJ`	if `TRUE` then model selection on the columns of the design matrix (and thus the parameter `beta` in the model) is performed by Reversible Jump (RJ) MCMC. The initial model is specified by the `beta` input, described below, and the maximal number of covariates in the model is specified by `M`
`M`	the maximal number of allowed covariates (columns of `X`) in the model. If input `lambda2 > 0` then any `M <= ncol(X)` is allowed. Otherwise it must be that `M <= min(ncol(X), length(y)-1)`, which is default value when a `NULL` argument is given
`beta`	initial setting of the regression coefficients. Any zero-components will imply that the corresponding covariate (column of `X`) is not in the initial model. When input `RJ = FALSE` (no RJ) and `lambda2 > 0` (use lasso) then no components are allowed to be exactly zero. The default setting is therefore contextual; see below for details
`lambda2`	square of the initial lasso penalty parameter. If zero, then least squares regressions are used
`s2`	initial variance parameter
`case`	specifies if ridge regression, the Normal-Gamma, or the horseshoe prior should be done instead of the lasso; only meaningful when `lambda2 > 0`
`mprior`	prior on the number of non-zero regression coefficients (and therefore covariates) `m` in the model. The default (`mprior = 0`) encodes the uniform prior on `0 <= m <= M`. A scalar value `0 < mprior < 1` implies a Binomial prior `Bin(m\|n=M,p=mprior)`. A 2-vector `mprior=c(g,h)` of positive values `g` and `h` represents gives `Bin(m\|n=M,p)` prior where `p~Beta(g,h)`
`rd`	`=c(r, delta)`, the alpha (shape) parameter and `\beta` (rate) parameter to the gamma distribution prior `G(r,delta)` for the `\lambda^2` parameter under the lasso model; or, the `\alpha` (shape) parameter and `\beta` (scale) parameter to the inverse-gamma distribution `IG(r/2, delta/2)` prior for the `\lambda^2` parameter under the ridge regression model. A default of `NULL` generates appropriate non-informative values depending on the nature of the regression. Specifying `rd=FALSE` causes `lambda2` values to be fixed at their starting value, i.e., not sampled. See the details below for information on the special settings for ridge regression
`ab`	`=c(a, b)`, the `\alpha` (shape) parameter and the `\beta` (scale) parameter for the inverse-gamma distribution prior `IG(a,b)` for the variance parameter `s2`. A default of `NULL` generates appropriate non-informative values depending on the nature of the regression
`theta`	the rate parameter (`> 0`) to the exponential prior on the degrees of freedom paramter `nu` under a model with Student-t errors implemented by a scale-mixture prior. The default setting of `theta = 0` turns off this prior, defaulting to a normal errors prior
`rao.s2`	indicates whether Rao-Blackwellized samples for `\sigma^2` should be used (default `TRUE`); see below for more details
`icept`	if `TRUE`, an implicit intercept term is fit in the model, otherwise the the intercept is zero; default is `TRUE`
`normalize`	if `TRUE`, each variable is standardized to have unit L2-norm, otherwise it is left alone; default is `TRUE`
`verb`	verbosity level; currently only `verb = 0` and `verb = 1` are supported

Details

The Bayesian lasso model and Gibbs Sampling algorithm is described in detail in Park & Casella (2008). The algorithm implemented by this function is identical to that described therein, with the exception of an added “option” to use a Rao-Blackwellized sample of \sigma^2 (with \beta integrated out) for improved mixing, and the model selections by RJ described below. When input argument lambda2 = 0 is supplied, the model is a simple hierarchical linear model where (\beta,\sigma^2) is given a Jeffrey's prior

Specifying RJ = TRUE causes Bayesian model selection and averaging to commence for choosing which of the columns of the design matrix X (and thus parameters beta) should be included in the model. The zero-components of the beta input specify which columns are in the initial model, and M specifies the maximal number of columns.

The RJ mechanism implemented here for the Bayesian lasso model selection differs from the one described by Hans (2009), which is based on an idea from Geweke (1996). Those methods require departing from the Park & Casella (2008) latent-variable model and requires sampling from each conditional \beta_i | \beta_{(-i)}, \dots for all i, since a mixture prior with a point-mass at zero is placed on each \beta_i. Out implementation here requires no such special prior and retains the joint sampling from the full \beta vector of non-zero entries, which we believe yields better mixing in the Markov chain. RJ proposals to increase/decrease the number of non-zero entries does proceed component-wise, but the acceptance rates are high due due to marginalized between-model moves (Troughton & Godsill, 1997).

When the lasso prior or RJ is used, the automatic thinning level (unless thin != NULL) is determined by the number of columns of X since this many latent variables are introduced

Bayesian ridge regression is implemented as a special case via the bridge function. This essentially calls blasso with case = "ridge". A default setting of rd = c(0,0) is implied by rd = NULL, giving the Jeffery's prior for the penalty parameter \lambda^2 unless ncol(X) >= length(y) in which case the proper specification of rd = c(5,10) is used instead.

The Normal–Gamma prior (Griffin & Brown, 2009) is implemented as an extension to the Bayesian lasso with case = "ng". Many thanks to James Scott for providing the code needed to extend the method(s) to use the horseshoe prior (Carvalho, Polson, Scott, 2010).

When theta > 0 then the Student-t errors via scale mixtures (and thereby extra latent variables omega2) of Geweke (1993) is applied as an extension to the Bayesian lasso/ridge model. If Student-t errors are used the automatic thinning level is augmented (unless thin != NULL) by the number of rows in X since this many latent variables are introduced

Value

blasso returns an object of class "blasso", which is a list containing a copy of all of the input arguments as well as of the components listed below.

`call`	a copy of the function call as used
`mu`	a vector of `T` samples of the (un-penalized) “intercept” parameter
`beta`	a `T*ncol(X)` `matrix` of `T` samples from the (penalized) regression coefficients
`m`	the number of non-zero entries in each vector of `T` samples of `beta`
`s2`	a vector of `T` samples of the variance parameter
`lambda2`	a vector of `T` samples of the penalty parameter
`gamma`	a vector of `T` with the gamma parameter when `case = "ng"`
`tau2i`	a `T*ncol(X)` `matrix` of `T` samples from the (latent) inverse diagonal of the prior covariance matrix for `beta`, obtained for Lasso regressions
`omega2`	a `T*nrow(X)` `matrix` of `T` samples from the (latent) diagonal of the covariance matrix of the response providing a scale-mixture implementation of Student-t errors with degrees of freedom `nu` when active (input `theta > 0`)
`nu`	a vector of `T` samples of the degrees of freedom parameter to the Student-t errors mode when active (input `theta > 0`)
`pi`	a vector of `T` samples of the Binomial proportion `p` that was given a Beta prior, as described above for the 2-vector version of the `mprior` input
`lpost`	the log posterior probability of each (saved) sample of the joint parameters
`llik`	the log likelihood of each (saved) sample of the parameters
`llik.norm`	the log likelihood of each (saved) sample of the parameters under the Normal errors model when sampling under the Student-t model; i.e., it is not present unless `theta > 0`

Note

Whenever ncol(X) >= nrow(X) it must be that either RJ = TRUE with M <= nrow(X)-1 (the default) or that the lasso is turned on with lambda2 > 0. Otherwise the regression problem is ill-posed.

Since the starting values are considered to be first sample (of T), the total number of (new) samples obtained by Gibbs Sampling will be T-1

Author(s)

Robert B. Gramacy rbg@vt.edu

References

Park, T., Casella, G. (2008). The Bayesian Lasso.
Journal of the American Statistical Association, 103(482), June 2008, pp. 681-686
\Sexpr[results=rd]{tools:::Rd_expr_doi("10.1198/016214508000000337")}

Griffin, J.E. and Brown, P.J. (2009). Inference with Normal-Gamma prior distributions in regression problems. Bayesian Analysis, 5, pp. 171-188.
\Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/10-BA507")}

Hans, C. (2009). Bayesian Lasso regression. Biometrika 96, pp. 835-845.
\Sexpr[results=rd]{tools:::Rd_expr_doi("10.1093/biomet/asp047")}

Carvalho, C.M., Polson, N.G., and Scott, J.G. (2010) The horseshoe estimator for sparse signals. Biometrika 97(2): pp. 465-480.
https://faculty.chicagobooth.edu/nicholas.polson/research/papers/Horse.pdf

Geweke, J. (1996). Variable selection and model comparison in regression. In Bayesian Statistics 5. Editors: J.M. Bernardo, J.O. Berger, A.P. Dawid and A.F.M. Smith, 609-620. Oxford Press.

Paul T. Troughton and Simon J. Godsill (1997). A reversible jump sampler for autoregressive time series, employing full conditionals to achieve efficient model space moves. Technical Report CUED/F-INFENG/TR.304, Cambridge University Engineering Department.

Geweke, J. (1993) Bayesian treatment of the independent Student-t linear model. Journal of Applied Econometrics, Vol. 8, S19-S40

https://bobby.gramacy.com/r_packages/monomvn/

Examples

## following the lars diabetes example
data(diabetes)
attach(diabetes)

## Ordinary Least Squares regression
reg.ols <- regress(x, y)

## Lasso regression
reg.las <- regress(x, y, method="lasso")

## Bayesian Lasso regression
reg.blas <- blasso(x, y)

## summarize the beta (regression coefficients) estimates
plot(reg.blas, burnin=200)
points(drop(reg.las$b), col=2, pch=20)
points(drop(reg.ols$b), col=3, pch=18)
legend("topleft", c("blasso-map", "lasso", "lsr"),
       col=c(2,2,3), pch=c(21,20,18))

## plot the size of different models visited
plot(reg.blas, burnin=200, which="m")

## get the summary
s <- summary(reg.blas, burnin=200)

## calculate the probability that each beta coef != zero
s$bn0

## summarize s2
plot(reg.blas, burnin=200, which="s2")
s$s2

## summarize lambda2
plot(reg.blas, burnin=200, which="lambda2")
s$lambda2


## Not run: 
## fit with Student-t errors
## (~400-times slower due to automatic thinning level)
regt.blas <- blasso(x, y, theta=0.1)

## plotting some information about nu, and quantiles
plot(regt.blas, "nu", burnin=200)
quantile(regt.blas$nu[-(1:200)], c(0.05, 0.95))

## Bayes Factor shows strong evidence for Student-t model
mean(exp(regt.blas$llik[-(1:200)] - regt.blas$llik.norm[-(1:200)]))

## End(Not run)

## clean up
detach(diabetes)

monomvn documentation built on Sept. 30, 2024, 9:45 a.m.