Built using Zelig version r packageVersion("Zelig")
knitr::opts_knit$set( stop_on_error = 2L ) knitr::opts_chunk$set( fig.height = 11, fig.width = 7, eval = TRUE ) options(cite = FALSE)
Bootstrapping is often used to obtain a robust estimate of the uncertainty of a parameter due to sampling error.
In the nonparametric bootstrap, new datasets are iteratively created by resampling with replacement from the original dataset. Resampling from the available sample data gives an approximation of sampling new datasets from the original population. The model of interest is rerun in each newly constructed dataset, and the distribution of parameter estimates shows the variance of the sampling distribution. This can create a robust numerical confidence interval for the sampling uncertainty of this parameter, and indeed recover a confidence interval when such is not analytically tractable.
In the parametric bootstrap, we instead use the model estimates from the original sample data to create these new datasets, or some other function of those datsets.
The normal algorithm that Zelig uses to simulate quantities of interest is a form of the parametric bootstrap. Zelig has an argument, however, to switch to the nonparametric bootstrap. Hereafter, when we say bootstrap, we imply the nonparametric form.
The zelig()
bootstrap
argument has a default of FALSE
, and can be set to TRUE
or a numeric value giving the number of bootstrapped datasets to run. If set to TRUE
the default is 100 bootstraps. The bootstrap works in combination with other Zelig arguments as follows:
If a dataset of $N$ observations is supplied with weights, the bootstrap will provide a new dataset of size $N$, with probability of each observation being resampled directly proportional to the weighting on that observation.
Presently, if the data is multiply imputed, the bootstrap function is disabled. We are currently researching correct interaction and implementation of the bootstrap across imputed datasets.
Presently, the bootstrap is available for all models implemented in Zelig, except MCMC models and time-series models. Models utilizing MCMC sampling aren't conformable with a bootstrap, as they are different approaches to the same concept. The serial correlation assumptions in time-series models violate the assumption that the observations present in the sample are drawn independently and identically (iid) from the population, and make appropriate bootstrap designs more complicated. Zelig does not presently have a time-series appropriate bootstrap design (such as residual bootstraps or the block bootstrap), so the bootstrap is not an option for this set of models.
rm(list=ls(pattern="\\.out")) suppressWarnings(suppressMessages(library(zeligverse))) set.seed(1234)
Attach sample data:
data(macro)
Estimate the model, setting the number of bootstrapped datasets to construct:
z.out <- zelig(unem ~ gdp + capmob + trade, model = "ls", data = macro, bootstrap = 500)
Summary by default shows the point parameter estimates, with the standard errors generated by the bootstrap.
summary(z.out)
We can instead choose to show the bagging estimator, that is, the average parameter value across all the bootstrapped datasets. Bagging generally trades bias for a reduction in variance that results in lower mean squared error (notably in non-linear models). The bagging estimator can be obtained as:
summary(z.out, bagging = TRUE)
If we want to inspect particular individual results, the subset argument is available:
summary(z.out, subset = 13:15)
If $B$ bootstraps were obtained, the first $B$ results are those models on the bootstrapped data, and the $B+1$ -th result is the model estimated on the original data. The value of $B$ is stored in a field in the Zelig object, named bootstrap.num
. In our running example this was 500:
summary(z.out, subset = (z.out$bootstrap.num + 1))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.