comparepriors.lm: Selected models under different choices of prior on the model...

View source: R/comparepriors.lm.R

comparepriors.lmR Documentation

Selected models under different choices of prior on the model parameters and the model space

Description

Given a formula and a data frame, computes the maximum a posteriori (MAP) model and median probability model (MPM) for different choices of prior on the model parameters and the model space. Normal linear models are assumed for the data with the prior distribution on the model parameters being one or more of the following: PEP, intrinsic, Zellner’s g–prior, Zellner and Siow, benchmark, robust, hyper–g and related hyper–gn. The prior distribution on the model space can be either the uniform on models or the uniform on the model dimension (special case of the beta–binomial prior). The model space consists of all possible models including an intercept term. Model selection is performed by using either full enumeration and evaluation of all models (for model spaces of small–to–moderate dimension) or a Markov chain Monte Carlo (MCMC) scheme (for model spaces of large dimension).

Usage

comparepriors.lm(
  formula,
  data,
  algorithmic.choice = "automatic",
  priorbetacoeff = c("PEP", "intrinsic", "Robust", "gZellner", "ZellnerSiow", "FLS",
    "hyper-g", "hyper-g-n"),
  reference.prior = c(TRUE, FALSE),
  priormodels = c("beta-binomial", "uniform"),
  burnin = 1000,
  itermcmc = 11000
)

Arguments

formula

A formula, defining the full model.

data

A data frame (of numeric values), containing the data.

algorithmic.choice

A character, the type of algorithm to be used for selection: full enumeration and evaluation of all models or an MCMC scheme. One of “automatic” (the choice is done automatically based on the number of explanatory variables in the full model), “full enumeration” or “MCMC”. Default value="automatic".

priorbetacoeff

A vector of character containing the different priors on the model parameters. The character can be one of “PEP”, “intrinsic”, “Robust”, “gZellner”, “ZellnerSiow”, “FLS”, “hyper–g” and “hyper–g–n”.
Default value= c("PEP","intrinsic","Robust", "gZellner","ZellnerSiow", "FLS","hyper-g","hyper-g-n"), i.e., all supported priors are tested.

reference.prior

A vector of logical indicating the baseline prior that is used for PEP/intrinsic. It can be TRUE (reference prior is used), FALSE (dependence Jeffreys prior is used) or both. Default value=c(TRUE,FALSE), i.e., both baseline priors are tested.

priormodels

A vector of character containing the different priors on the model space. The character can be one of “beta–binomial” and “uniform”.
Default value=c("beta-binomial","uniform"), i.e., both supported priors are tested.

burnin

Non–negative integer, the burnin period for the MCMC scheme. Default value=1000.

itermcmc

Positive integer (larger than burnin), the (total) number of iterations for the MCMC scheme. Default value=11000.

Details

The different priors on the model parameters are implemented using different packages: for PEP and intrinsic, the current package is used. For hyper–g and related hyper–g–n priors, the R package BAS is used. Finally, for the Zellner’s g–prior (“gZellner”), the Zellner and Siow (“ZellnerSiow”), the robust and the benchmark (“FLS”) prior, the results are obtained using BayesVarSel.

The prior distribution on the model space can be either the uniform on models or the beta–binomial. For the beta–binomial prior, the following special case is used: uniform prior on model dimension.

When an MCMC scheme is used, the R package BAS uses the birth/death random walk in Raftery et al. (1997) combined with a random swap move, BayesVarSel uses Gibbs sampling while PEPBVS implements the MC3 algorithm described in the Appendix of Fouskakis and Ntzoufras (2022).

To assess MCMC convergence, Monte Carlo (MC) standard error is computed using batch means estimator (implemented in the R package mcmcse). For computing a standard error, the number (itermcmc-burnin) needs to be larger than 100. This quantity cannot be computed for the cases treated by BAS — since all ‘visited’ models are not available in the function output — and thus for those cases NA is depicted in the relevant column instead.

Similar to pep.lm, if algorithmic.choice equals “automatic” then model selection is implemented as follows: if p < 20 (where p is the number of explanatory variables in the full model without the intercept), full enumeration and evaluation of all models is performed, otherwise an MCMC scheme is used. To avoid potential memory or time constraints, if algorithmic.choice equals “full enumeration” but p \geq 20, once issuing a warning message, an MCMC scheme is used instead.

Similar constraints to pep.lm hold for the data, i.e., the case of missing data is not currently supported, the explanatory variables need to be quantitative and cannot have an exact linear relationship, and p\leq n-2 (n being the sample size).

Value

comparepriors.lm returns a list with two elements:

MAPmodels

A data frame containing the MAP models for all different combinations of prior on the model parameters and the model space. In particular, in row i the following information is presented: prior on the model parameters, prior on the model space, hyperparameter value, MAP model (corresponding to the specific combination of priors on model parameters and model space) represented with variable inclusion indicators, and the R package used. When an MCMC scheme has been used, there are two additional columns: one depicting the specific algorithm that has been used and one with the MC standard error (to assess convergence). With an MCMC scheme, the MAP model output corresponds to the most frequently ‘visited’.

MPMmodels

Same as the first element containing the MPM models instead.

References

Bayarri, M., Berger, J., Forte, A. and Garcia–Donato, G. (2012) Criteria for Bayesian Model Choice with Application to Variable Selection. The Annals of Statistics, 40(3): 1550–1577. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/12-AOS1013")}

Fouskakis, D. and Ntzoufras, I. (2022) Power–Expected–Posterior Priors as Mixtures of g–Priors in Normal Linear Models. Bayesian Analysis, 17(4): 1073-1099. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/21-BA1288")}

Ley, E. and Steel, M. (2012) Mixtures of g–Priors for Bayesian Model Averaging with Economic Applications. Journal of Econometrics, 171(2): 251–266. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.jeconom.2012.06.009")}

Liang, F., Paulo, R., Molina, G., Clyde, M. and Berger, J. (2008) Mixtures of g Priors for Bayesian Variable Selection. Journal of the American Statistical Association, 103(481): 410–423. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1198/016214507000001337")}

Raftery, A., Madigan, D. and Hoeting, J. (1997) Bayesian Model Averaging for Linear Regression Models. Journal of the American Statistical Association, 92(437): 179–191. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/01621459.1997.10473615")}

Zellner, A. (1976) Bayesian and Non–Bayesian Analysis of the Regression Model with Multivariate Student–t Error Terms. Journal of the American Statistical Association, 71(354): 400–405. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/01621459.1976.10480357")}

Zellner, A. and Siow, A. (1980) Posterior Odds Ratios for Selected Regression Hypotheses. Trabajos de Estadistica Y de Investigacion Operativa, 31: 585-603. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/BF02888369")}

Examples

data(UScrime_data)
resc <- comparepriors.lm(y~.,UScrime_data,
                         priorbetacoeff = c("PEP","hyper-g-n"),
                         reference.prior = TRUE,priormodels = "beta-binomial")


PEPBVS documentation built on April 3, 2025, 6:12 p.m.