bayesreg | R Documentation |
Fit a linear or logistic regression model using Bayesian continuous shrinkage prior distributions. Handles ridge, lasso, horseshoe and horseshoe+ regression with logistic,
Gaussian, Laplace, Student-t, Poisson or geometric distributed targets. See bayesreg-package
for more details on the features available in this package.
bayesreg(
formula,
data,
model = "normal",
prior = "ridge",
n.samples = 1000,
burnin = 1000,
thin = 5,
t.dof = 5,
n.cores = Inf
)
formula |
An object of class " |
data |
A data frame containing the variables in the model. |
model |
The distribution of the target (y) variable. Continuous or numeric variables can be distributed as per a Gaussian distribution ( |
prior |
Which continuous shrinkage prior distribution over the regression coefficients to use. Options include ridge regression
( |
n.samples |
Number of posterior samples to generate. |
burnin |
Number of burn-in samples. |
thin |
Desired level of thinning. |
t.dof |
Degrees of freedom for the Student-t distribution. |
n.cores |
Maximum number of cores to use; if |
An object with S3 class "bayesreg"
containing the results of the sampling process, plus some additional information.
beta |
Posterior samples the regression model coefficients. |
beta0 |
Posterior samples of the intercept parameter. |
sigma2 |
Posterior samples of the square of the scale parameter; for Gaussian distributed targets this is equal to the variance. For binary targets this is empty. |
mu.beta |
The mean of the posterior samples for the regression coefficients. |
mu.beta0 |
The mean of the posterior samples for the intercept parameter. |
mu.sigma2 |
The mean of the posterior samples for squared scale parameter. |
tau2 |
Posterior samples of the global shrinkage parameter. |
t.stat |
Posterior t-statistics for each regression coefficient. |
var.ranks |
Ranking of the covariates by their importance, with "1" denoting the most important covariate. |
log.l |
The log-likelihood at the posterior means of the model parameters |
waic |
The Widely Applicable Information Criterion (WAIC) score for the model |
waic.dof |
The effective degrees-of-freedom of the model, as estimated by the WAIC. |
The returned object also stores the parameters/options used to run bayesreg
:
formula |
The object of type " |
model |
The distribution of the target (y) variable. |
prior |
The shrinkage prior used to fit the model. |
n.samples |
The number of samples generated from the posterior distribution. |
burnin |
The number of burnin samples that were generated. |
thin |
The level of thinning. |
n |
The sample size of the data used to fit the model. |
p |
The number of covariates in the fitted model. |
n.cores |
The number of cores actually used during sampling. |
Draws a series of samples from the posterior distribution of a linear (Gaussian, Laplace or Student-t) or generalized linear (logistic binary, Poisson, geometric) regression model with specified continuous shrinkage prior distribution (ridge regression, lasso, horseshoe and horseshoe+) using Gibbs sampling. The intercept parameter is always included, and is never penalised.
While only n.samples
are returned, the total number of samples generated is equal to burnin
+n.samples
*thin
. To generate the samples
of the regression coefficients, the code will use either Rue's algorithm (when the number of samples is twice the number of covariates) or the algorithm of
Bhattacharya et al. as appropriate. Factor variables are automatically grouped together and
additional shrinkage is applied to the set of indicator variables to which they expand.
If n.cores > 1
then Bayesreg will divide the total sampling process between a number of independent sampling chains, with each chain being run on
a seperate core. If n.cores=Inf
then Bayesreg will use one less than the total number of available cores.
If n.cores
is a positive integer, then this is the maximum number of cores Bayesreg will use, though Bayesreg will cap this at the total number of available cores minus one.
The total number of samples are split evenly between the number of cores being utilised, and after sampling
the different chains are recombined into a single body of samples. Each chain will perform
burnin
burn-in sampling iterations before collecting samples, so the improvement in speed from
using from multiple cores will be greater if n.samples*thin
is substantially greater than burnin
.
To cite this toolbox please reference:
Makalic, E. & Schmidt, D. F. High-Dimensional Bayesian Regularised Regression with the BayesReg Package arXiv:1611.06649 [stat.CO], 2016 http://arxiv.org/pdf/1611.06649
A MATLAB implementation of the bayesreg function is also available from:
Copyright (C) Daniel F. Schmidt and Enes Makalic, 2016-2021
Makalic, E. & Schmidt, D. F. High-Dimensional Bayesian Regularised Regression with the BayesReg Package arXiv:1611.06649 [stat.CO], 2016 http://arxiv.org/pdf/1611.06649
Park, T. & Casella, G. The Bayesian Lasso Journal of the American Statistical Association, Vol. 103, pp. 681-686, 2008
Carvalho, C. M.; Polson, N. G. & Scott, J. G. The horseshoe estimator for sparse signals Biometrika, Vol. 97, 465-480, 2010
Makalic, E. & Schmidt, D. F. A Simple Sampler for the Horseshoe Estimator IEEE Signal Processing Letters, Vol. 23, pp. 179-182, 2016 http://arxiv.org/pdf/1508.03884v4
Bhadra, A.; Datta, J.; Polson, N. G. & Willard, B. The Horseshoe+ Estimator of Ultra-Sparse Signals Bayesian Analysis, 2016
Polson, N. G.; Scott, J. G. & Windle, J. Bayesian inference for logistic models using Polya-Gamma latent variables Journal of the American Statistical Association, Vol. 108, 1339-1349, 2013
Rue, H. Fast sampling of Gaussian Markov random fields Journal of the Royal Statistical Society (Series B), Vol. 63, 325-338, 2001
Bhattacharya, A.; Chakraborty, A. & Mallick, B. K. Fast sampling with Gaussian scale-mixture priors in high-dimensional regression arXiv:1506.04778, 2016
Schmidt, D.F. & Makalic, E. Bayesian Generalized Horseshoe Estimation of Generalized Linear Models ECML PKDD 2019: Machine Learning and Knowledge Discovery in Databases. pp 598-613, 2019
Stan Development Team, Stan Reference Manual (Version 2.26), Section 15.4, "Effective Sample Size", https://mc-stan.org/docs/2_18/reference-manual/effective-sample-size-section.html
The prediction function predict.bayesreg
and summary function summary.bayesreg
# -----------------------------------------------------------------
# By default Bayesreg now utilizes multiple cores if you have them
# available. If you do not want to use multiple cores you can set
# n.cores=1 when calling Bayesreg.
#
# In most realistic/practical settings, i.e., when a
# even a moderate number of samples is being requested, parallelization
# will usually result in substantial speedups, and can dramatically reduce
# run-time for large numbers of samples.
#
# However, if your design matrix and desired number of samples
# are small (i.e, small nrow(X), ncol(X) and n.samples)
# then sometimes no parallelization, or parallelization with a small
# number of cores, can be quicker due to the overhead of setting up
# multiple threads. This is likely only relevant if you are calling Bayesreg
# hundreds/thousands of times with *small* numbers of samples being requested
# for each call, e.g., if you are doing some sort of one-at-a-time testing
# across many predictors such as a genome-wide association test, or similar.
# In this case you may be better off parallelizing the calls to
# Bayesreg across the tests and disabling parallelization when calling
# Bayesreg (i.e., n.cores=1).
#
# -----------------------------------------------------------------
# Example 1: Gaussian regression
X = matrix(rnorm(100*20),100,20)
b = matrix(0,20,1)
b[1:5] = c(5,4,3,2,1)
y = X %*% b + rnorm(100, 0, 1)
df <- data.frame(X,y)
rv.lm <- lm(y~.,df) # Regular least-squares
summary(rv.lm)
# Horseshoe regression -- here we show how to explicitly control the maximum
# number of cores being used for sampling to 4
rv.hs <- bayesreg(y~., df, prior="hs", n.cores=4)
rv.hs$n.cores # actual number of cores used (will be <= 4, depending on machine)
rv.hs.s <- summary(rv.hs)
# Expected squared prediction error for least-squares
coef_ls = coef(rv.lm)
as.numeric(sum( (as.matrix(coef_ls[-1]) - b)^2 ) + coef_ls[1]^2)
# Expected squared prediction error for horseshoe
as.numeric(sum( (rv.hs$mu.beta - b)^2 ) + rv.hs$mu.beta0^2)
# -----------------------------------------------------------------
# Example 2: Gaussian v Student-t robust regression
X = 1:10;
y = c(-0.6867, 1.7258, 1.9117, 6.1832, 5.3636, 7.1139, 9.5668, 10.0593, 11.4044, 6.1677);
df = data.frame(X,y)
# Gaussian ridge
rv.G <- bayesreg(y~., df, model = "gaussian", prior = "ridge", n.samples = 1e3)
# Student-t ridge
rv.t <- bayesreg(y~., df, model = "t", prior = "ridge", t.dof = 5, n.samples = 1e3)
# Plot the different estimates with credible intervals
plot(df$X, df$y, xlab="x", ylab="y")
yhat_G <- predict(rv.G, df, bayes.avg=TRUE)
lines(df$X, yhat_G[,1], col="blue", lwd=2.5)
lines(df$X, yhat_G[,3], col="blue", lwd=1, lty="dashed")
lines(df$X, yhat_G[,4], col="blue", lwd=1, lty="dashed")
yhat_t <- predict(rv.t, df, bayes.avg=TRUE)
lines(df$X, yhat_t[,1], col="darkred", lwd=2.5)
lines(df$X, yhat_t[,3], col="darkred", lwd=1, lty="dashed")
lines(df$X, yhat_t[,4], col="darkred", lwd=1, lty="dashed")
legend(1,11,c("Gaussian","Student-t (dof=5)"),lty=c(1,1),col=c("blue","darkred"),
lwd=c(2.5,2.5), cex=0.7)
## Not run:
# -----------------------------------------------------------------
# Example 3: Poisson/geometric regression example
X = matrix(rnorm(100*5),100,5)
b = c(0.5,-1,0,0,1)
nu = X%*%b + 1
y = rpois(lambda=exp(nu),n=length(nu))
df <- data.frame(X,y)
# Fit a Poisson regression
rv.pois=bayesreg(y~.,data=df,model="poisson",prior="hs", burnin=1e4, n.samples=5e4)
summary(rv.pois)
# Fit a geometric regression
rv.geo=bayesreg(y~.,data=df,model="geometric",prior="hs", burnin=1e4, n.samples=5e4)
summary(rv.geo)
# Compare the two models in terms of their WAIC scores
cat(sprintf("Poisson regression WAIC=%g vs geometric regression WAIC=%g",
rv.pois$waic, rv.geo$waic))
# Poisson is clearly preferred to geometric, which is good as data is generated from a Poisson!
# -----------------------------------------------------------------
# Example 4: Logistic regression on spambase
data(spambase)
# bayesreg expects binary targets to be factors
spambase$is.spam <- factor(spambase$is.spam)
# First take a subset of the data (1/10th) for training, reserve the rest for testing
spambase.tr = spambase[seq(1,nrow(spambase),10),]
spambase.tst = spambase[-seq(1,nrow(spambase),10),]
# Fit a model using logistic horseshoe for 2,000 samples
# In practice, > 10,000 samples would be a more realistic amount to draw
rv <- bayesreg(is.spam ~ ., spambase.tr, model = "logistic", prior = "horseshoe", n.samples = 2e3)
# Summarise, sorting variables by their ranking importance
rv.s <- summary(rv,sort.rank=TRUE)
# Make predictions about testing data -- get class predictions and class probabilities
y_pred <- predict(rv, spambase.tst, type='class')
# Check how well did our predictions did by generating confusion matrix
table(y_pred, spambase.tst$is.spam)
# Calculate logarithmic loss on test data
y_prob <- predict(rv, spambase.tst, type='prob')
cat('Neg Log-Like for no Bayes average, posterior mean estimates: ', sum(-log(y_prob[,1])), '\n')
y_prob <- predict(rv, spambase.tst, type='prob', sum.stat="median")
cat('Neg Log-Like for no Bayes average, posterior median estimates: ', sum(-log(y_prob[,1])), '\n')
y_prob <- predict(rv, spambase.tst, type='prob', bayes.avg=TRUE)
cat('Neg Log-Like for Bayes average: ', sum(-log(y_prob[,1])), '\n')
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.