modelSampler: A unique tool for variable selection and model exploration in...

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/modelSampler.R

Description

Core function for the package. A call to modelSampler initiates a Gibbs sampler for drawing values from the posterior of a rescaled spike and slab model. Results from the Gibbs sampler are used to derive optimal AIC, BIC and highest posterior models from a restricted model search. The core function can also be called from its bootstrap wrapper, boot.modelSampler which can be used to assess the stability of AIC and BIC model selection as well as providing a more stable set of variables.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
    modelSampler(formula,         
       data,            
       V.small=0.05,     
       V.big=NULL,       
       n.iter1=2500,     
       n.iter2=2500,     
       fast=FALSE,           
       beta.blocks=100,  
       complexity=NULL,  
       verbose=TRUE,   
       seed=NULL,        
             ...)
   

Arguments

formula

A symbolic description of the model that is to be fit.

data

Data frame containing the predictors (variables) in the model.

V.small

Small hypervariance set to implement selective shrinkage.

V.big

Big hypervariance. If null, V.big = n, sample size

n.iter1

Number of burn-in iterations.

n.iter2

Number of iterations sampled after burn-in.

fast

Break up beta update into 'beta.blocks' chunks. Typically set to 'FALSE'.

beta.blocks

Size of beta updates (only used when fast=TRUE).

complexity

Model complexity parameter, which is estimated by Gibbs sampler.

verbose

Print iterations and other user friendly outputs.

seed

Set random generator seed.

...

Further arguments passed to or from other methods.

Details

The specially designed Bayesian rescaled spike and slab model is designed to induce a type of regularization called selective shrinkage (for details see, reference). Selective shrinkage is due to the type of two-point prior used for the hypervariance in the prior as well as the choice of V.big, which by default is set to the sample size.

Value

An object of class modelSampler, which is a list with the following components:

formula

The original formula used in calling modelSampler.

modelTracker

Total models visited after burn-in sampling.

beta.all

Sampled beta values after burn-in sampling.

FPE

Lists of variables selected by AIC and BIC. Also returns posterior inclusion probability of each variable.

FPEstrat

Returns top models stratified by size. Selection criterion is minimum residual sum of squares (RSS).

FPEstart.pen

Returns FPE values of the models stratified by model size. Also returns frequencies of models visited by modelSampler.

hpm

Returns the posterior inclusion probability of each variable.

mss

Returns minimum RSS values of each model visited by modelSampler.

aic

Returns AIC values of each model visited by modelSampler.

bic

Returns BIC values of each model visited by modelSampler.

coverage

Returns a vector of probability of visiting a new model at each iteration visited by modelSampler.

complexity

Returns a vector of estimated complexity parameters at each iteration by modelSampler.

Author(s)

Tanujit Dey tanujit.dey@gmail.com

References

Ishwaran, H. and Rao, J. S. (2003). Detecting differentially expressed genes in microarrays using Bayesian model selection. J. Amer. Stat. Assoc., 98, 438 – 455.

Ishwaran, H. and Rao, J. S. (2005). Spike and slab gene selection for multigroup microarray data. J. Amer. Stat. Assoc., 100, 764 – 780.

Ishwaran, H. and Rao, J. S. (2005). Spike and slab variable selection: frequentist and Bayesian strategies. Ann. Statist., 33, 730 – 773.

Dey, T. (2013). modelSampler: An R Tool for Variable Selection and Model Exploration in Linear Regression. Journal of Data Science, 11(2), 371–387.

See Also

boot.modelSampler, print.boot.modelSampler, print.modelSampler, plot.modelSampler, plot.icicle, plot.FPE, plot.var.stability, plot.ooberror.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
 # Example 1:

  data(Pollute, package = "modelSampler") 
  ms.out <- modelSampler(MortRate~., Pollute, n.iter1=2500, 
  n.iter2=2500, verbose=TRUE)

  # Print several outputs from modelSampler. 
  
  print(ms.out)
  
  # Returns a collection of graphics which includes a complexity plot; 
  # a penalization plot which depicts model size specific estimated 
  # minimum residual sum of squares, AIC, BIC values; a dimensionality plot 
  # of several model sizes visited by modelSampler; 
  # an image plot to visualize variable importance
  # and a coverage plot depicting the probability of visiting new model by Gibbs sampler. 
  # For details of each plot, see plot.modelSampler.

  plot.modelSampler(ms.out)

  # Based on preliminary analysis, an out-of-bag technique is used
  # estimate prediction error (PE). Based on estimated PE, 
  # the best model of size "k" is being selected.
  
  ms.boot <- boot.modelSampler(MortRate~., Pollute, n.iter1=2500, 
  n.iter2=2500, B=20, verbose = TRUE)
  
  # Prints selected subset of variables, based on estimated prediction error.
  
  print(ms.boot)
  
  # This plot will give an idea about instability of FPE model selection criteria.
    
  plot.FPE(ms.boot)
  
  # This plot will depict the model space.

  plot.icicle(ms.boot, main="The Pollute data")
 
  # Graphical visualization for selecting "the" best model based on estimated 
  # prediction error of hard shrunk predictors.

  plot.ooberror(ms.boot, main="The Pollute data")
  

tanujitdey/modelSampler documentation built on May 5, 2019, 11:01 p.m.