boot.modelSampler: An ensemble technique to select the best subset of variables

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/boot.modelSampler.R

Description

Using Out-of-bag technique, the best subset of variables is found.

Usage

1
2
3
4
5
6
7
    boot.modelSampler(formula, 
		data, 
		n.iter1=2500,
		n.iter2=2500,
		B=20,
		verbose=TRUE,
			...)

Arguments

formula

A symbolic description of the model that is to be fit. This argument is required for modelSampler.

data

Data frame containing the predictors (variables) in the model. This argument is required for modelSampler.

n.iter1

Number of burn-in samples. This argument is required for modelSampler.

n.iter2

Number of samples after burn-in. This argument is required for modelSampler.

B

Total number of bootstrap iterations.

verbose

Prints pretty summary from each bootstrap iteration.

...

Further arguments passed to or from other methods.

Details

boot.modelSampler is a bootstrap wrapper that calls the primary function modelSampler. The user specifies B, the number of bootstrap draws to use, and the wrapper then makes B calls to the primary function. Each call uses a bootstrap draw of the original data. For each bootstrap draw, a hard shrunk posterior mean is computed for each model size visited by the modelSampler. Hard shrunk estimators are then combined over the B draws to form an ensemble for each given model size. Out-of-bagging is then used to estimate the prediction error for each of these ensemble hard shrunk predictors and the predictor with the smallest prediction error is determined. The dimension of this predictor is defined to be the optimal model size. The optimal model is chosen by then selecting the first ordered k variables. Ordering is based on an ensemble BMA predictor formed by averaging the Bayesian model averaged (BMA) estimator over the B bootstrap draws.

Value

An object of class boot.modelSampler, which is a list with the following components:

beta.count

Returns a matrix whose each row corresponds to model size and each column corresponds to variable name. The entries corresponds to each cell is the number of times each variable being identified as the most significant variable according to the model size.

beta.ensemble

Bagged ensemble estimators.

oob.pe.hard

Out-of-bag estimated prediction error for hard shrunk predictors for each model.

oob.pe.ensemble

Out-of-bag estimated prediction error for the ensemble predictors.

track.aic

Returns a matrix of AIC models during each bootstrap iteration.

track.bic

Returns a matrix of BIC models during each bootstrap iteration.

aicbic.full

AIC-BIC models from full data.

oob.se

Standard error of out-of-bag estimated prediction error for hard shrunk predictors for each model.

Author(s)

Tanujit Dey tanujit.dey@gmail.com

References

Dey, T. (2013). modelSampler: An R Tool for Variable Selection and Model Exploration in Linear Regression. Journal of Data Science, 11(2), 371-387.

See Also

modelSampler, print.boot.modelSampler, print.modelSampler, plot.modelSampler, plot.FPE, plot.icicle, plot.var.stability, plot.ooberror.

Examples

1
2
3
4
5
  data(Pollute, package = "modelSampler") 
  ms.boot <- boot.modelSampler(MortRate~., Pollute, n.iter1 = 2500, 
  n.iter2=2500, B=20, verbose = TRUE)
  print(ms.boot)
  

tanujitdey/modelSampler documentation built on May 5, 2019, 11:01 p.m.