pmomLM: Bayesian variable selection and model averaging for linear...

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/pmomLM.R

Description

Variable selection for linear and probit models, providing a sample from the joint posterior of the model and regression coefficients. pmomLM and pmomPM implement product Normal MOM and heavy-tailed product MOM as prior distribution for linear and probit model coefficients (respectively). emomLM and emomPM set an eMOM prior.

pplPM finds the value of the prior dispersion parameter tau minimizing posterior expected predictive loss (Gelfand and Ghosh, 1998) for the Probit model, i.e. can be used to automatically set up tau.

ppmodel returns the proportion of visits to each model.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
pmomLM(y, x, xadj, center=FALSE, scale=FALSE, niter=10^4, thinning=1, 
burnin=round(niter/10), priorCoef, priorDelta, priorVar, initSearch='greedy', 
verbose=TRUE)
pmomPM(y, x, xadj, niter=10^4, thinning=1, burnin=round(niter/10),
priorCoef, priorDelta, initSearch='greedy', verbose=TRUE)

emomLM(y, x, xadj, center=FALSE, scale=FALSE, niter=10^4, thinning=1, 
burnin=round(niter/10), priorCoef, priorDelta, priorVar, initSearch='greedy', 
verbose=TRUE)
emomPM(y, x, xadj, niter=10^4, thinning=1, burnin =round(niter/10),
priorCoef, priorDelta, initSearch='greedy', verbose=TRUE)

pplPM(tauseq=exp(seq(log(.01),log(2),length=20)), kPen=1, y, x, xadj, niter=10^4,
thinning=1, burnin=round(niter/10), priorCoef, priorDelta, priorVar,
initSearch='greedy', mc.cores=1)

ppmodel(nlpfit)

Arguments

y

Vector with observed responses. For pmomLM this must be a numeric vector. For pmomPM it can either be a logical vector, a factor with 2 levels or a numeric vector taking only two distinct values.

x

Design matrix with all potential predictors which are to undergo variable selection.

xadj

Design matrix for adjustment covariates, i.e. variables which are included in the model with probability 1. For instance, xadj can be used to force the inclusion of an intercept in the model.

center

If center==TRUE, y and x are centered to have zero mean, therefore eliminating the need to include an intercept term in x

scale

If scale==TRUE, y and columns in x are scaled to have standard deviation 1

niter

Number of MCMC sampling iterations

thinning

MCMC thinning factor, i.e. only one out of each thinning iterations are reported. Defaults to thinning=1, i.e. no thinning

burnin

Number of burn-in MCMC iterations. Defaults to .1*niter. Set to 0 for no burn-in

priorCoef

Prior distribution for the coefficients. Must be object of class msPriorSpec with slot priorType set to 'coefficients'. Possible values for slot priorDistr are 'pMOM', 'piMOM' and 'peMOM'

priorDelta

Prior on model indicator space. Must be object of class msPriorSpec with slot priorType set to 'modelIndicator'. Possible values for slot priorDistr are 'uniform' and 'binomial'. For 'binomial', you can either set the prior probability 'p' or specify a Beta-binomial prior by specifying the parameters 'alpha.p','beta.p'.

priorVar

Prior on residual variance. Must be object of class msPriorSpec with slot priorType set to 'nuisancePars'. Slot priorDistr must be equal to 'invgamma'

initSearch

Algorithm to refine deltaini. initSearch=='greedy' uses a greedy Gibbs sampling search. initSearch=='SCAD' sets deltaini to the non-zero elements in a SCAD fit with cross-validated regularization parameter. initSearch=='none' initializes to the null model with no variables in x included.

verbose

Set verbose==TRUE to print iteration progress

tauseq

Grid of tau values for which the posterior predictive loss should be evaluated.

kPen

Penalty term specifying the relative importance of deviations from the observed data vs deviation from the posterior predictive. kPen can be set either to a numeric value or to 'msize' to set penalty equal to the average model size. Loss is Dev(yp,yhat) + kPen*Dev(yhat,yobs), where yp: draw from post predictive, yobs: observed data and yhat is E(yp|yobs).

mc.cores

Allows for parallel computing. mc.cores is the number of processors to use. Setting mc.cores>1 requires the parallel package.

nlpfit

Non-local prior model fit, as returned by pmomLM, pmomPM, emomLM or emomPM.

Details

The implemented MCMC scheme makes proposals from the joint posterior of (delta[i],theta[i]) given all other parameters and the data, where delta[i] is the indicator for inclusion/exclusion of covariate i and theta[i] is the coefficient value. In contrast with some model fitting options implemented in modelSelection, here the scheme is exact. However, sampling the coefficients can adversely affect the mixing when covariates are very highly correlated. In practice, the mixing seems to be reasonably good for correlations up to 0.9.

pmomPM uses the scheme of Albert & Chib (1993) for probit models.

Value

pmomLM and pmomPM returns a list with elements

postModel

matrix with posterior samples for the model indicator. postModel[i,j]==1 indicates that variable j was included in the model in the MCMC iteration i

postCoef1

matrix with posterior samples for coefficients associated to x

postCoef2

matrix with posterior samples for coefficients associated to xadj

postPhi

vector with posterior samples for residual variance

postOther

postOther returns posterior samples for other parameters, i.e. basically hyper-parameters. Currently the prior precision parameter tau

margpp

Marginal posterior probability for inclusion of each covariate. This is computed by averaging marginal post prob for inclusion in each MCMC iteration, which is much more accurate than simply taking colMeans(postModel)

.

pplPM returns a list with elements

optfit

Probit model fit using tauopt. It is the result of a call to pmomPM.

PPL

data.frame indicating for each value in tauseq the posterior predictive loss (PPL=G+P), the goodness-of-fit (G) and penalty terms (P)

, the average number of covariates in the model (msize) including xadj and the smoothed sPPL obtained via a gam fit.

tauopt

Value of tau minimizing the PPL

Author(s)

David Rossell, Donatello Telesca

References

Johnson V.E., Rossell D. Non-Local Prior Densities for Default Bayesian Hypothesis Tests. Journal of the Royal Statistical Society B, 2010, 72, 143-170.

Johnson V.E., Rossell D. Bayesian model selection in high-dimensional settings. Technical report. 2011 See http://rosselldavid.googlepages.com for technical reports.

Albert, J. and Chib, S. (1993) Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88, p669-679

Gelfand, A. and Ghosh, S. (1998) Model choice: A minimum posterior predictive loss approach. Biometrika, 85, p1-11.

See Also

For more details on the prior specification see msPriorSpec-class To compute marginal densities for a given model see pmomMarginalK, pmomMarginalU, pimomMarginalK, pimomMarginalU.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#Simulate data
x <- matrix(rnorm(100*3),nrow=100,ncol=3)
xadj <- rep(1,nrow(x))
theta <- matrix(c(1,1,0),ncol=1)
y <- 10*xadj + x %*% theta + rnorm(100)

#Beta-binomial prior on model space
priorDelta <- modelbbprior(alpha.p=1,beta.p=1)

#Non-informative prior on residual variance
priorVar <- igprior(alpha=.01,lambda=.01)

#Product MOM prior with tau=0.3 on x coefficients
#Non-informative prior on xadj coefficients
priorCoef <- momprior(tau=0.3, tau.adj=10^6)

mom0 <- pmomLM(y=y,x=x,xadj=xadj,center=FALSE,scale=FALSE,niter=1000,
priorCoef=priorCoef,priorDelta=priorDelta,priorVar=priorVar)
round(colMeans(mom0$postModel),2)
round(colMeans(mom0$postCoef1),2)
round(colMeans(mom0$postCoef2),2)

#Alternative prior: hyper-prior on tau
priorCoef <- new("msPriorSpec",priorType='coefficients',priorDistr='pMOM',
priorPars=c(a.tau=1,b.tau=.135,tau.adj=10^6,r=1)) #hyper-prior
mom1 <-
pmomLM(y=y,x=x,xadj=xadj,center=FALSE,scale=FALSE,niter=1000,
priorCoef=priorCoef,priorDelta=priorDelta,priorVar=priorVar)
mean(mom1$postOther)  #posterior mean for tau

#Probit model
n <- 500; rho <- .25; niter <- 1000
theta <- c(.4,.6,0); theta.adj <- 0
V <- diag(length(theta)); V[upper.tri(V)] <- V[lower.tri(V)] <- rho
x <- rmvnorm(n,rep(0,length(theta)),V); xadj <- matrix(1,nrow=nrow(x),ncol=1)
lpred <- as.vector(x %*% matrix(theta,ncol=1) + xadj %*% matrix(theta.adj,ncol=1))
p <- pnorm(lpred)
y <- runif(n)<p

mom2 <- pmomPM(y=y,x=x,xadj=xadj,niter=1000,priorCoef=priorCoef,
priorDelta=priorDelta,initSearch='greedy')
colMeans(mom2$postCoef1)
coef(glm(y ~ x + xadj -1, family=binomial(link='probit')))

mombf documentation built on Dec. 6, 2019, 3:01 a.m.