knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%", fig.align = "center", fig.ext = "svg", dev = "svg" ) set.seed(32)
blocklength
is an R package used to automatically select the
block-length parameter for a block-bootstrap. It is meant for use with
dependent data such as stationary time series.
Regular bootstrap methods rely on assumptions that observations are independent and identically distributed (i.i.d.), but this assumption fails for many types of time series because we would expect the observation in the previous period to have some explanatory power over the current observation. This could occur in any time series from unemployment rates, stock prices, biological data, etc. A time series that is i.i.d. would look like white noise, since the following observation would be totally independent of the previous one (random).
To get around this problem, we can retain some of this time-dependence by breaking-up a time series into a number of blocks with length l. Instead of sampling each observation randomly (with replacement) like a regular bootstrap, we can resample these blocks at random. This way within each block the time-dependence is preserved.
The problem with the block bootstrap is the high sensitivity to the choice of block-length, or the number of blocks to break the time series into.
The goal of blocklength
is to simplify and automate the process of
selecting a block-length to perform a bootstrap on dependent data.
blocklength
has several functions that take their name from the
authors who have proposed them. Currently, there are three methods
available:
hhj()
takes its name from the Hall, Horowitz, and Jing
(1995)
"HHJ" method to select the optimal block-length using a
cross-validation algorithm which minimizes the mean squared error
(MSE) incurred by the bootstrap at various block-lengths.
pwsd()
takes its name from the Politis and White
(2004) Spectral Density
"PWSD" Plug-in method to automatically select the optimal
block-length using spectral density estimation via "flat-top" lag
windows of Politis and Romano
(1995).
nppi()
takes its name from the Lahiri, Furukawa, and Lee
(2007) Nonparametric
Plug-In "NPPI" method to select the optimal block-length for block
bootstrap procedures. The NPPI method estimates the leading term in
the first-order expansion of the theoretically optimal block length
by using resampling methods to construct consistent bias and
variance estimators for the block-bootstrap. Specifically, this
package implements the Moving Block Bootstrap (MBB) method of
Künsch (1989) and
the Moving Blocks Jackknife (MBJ) of Liu and Singh
(1992) as the bias and
variance estimators, respectively.
Under the hood, hhj()
uses the moving block bootstrap (MBB) procedure
according to Künsch
(1989) which resamples
blocks from a set of overlapping sub-samples with a fixed block-length.
However, the results of hhj()
may be generalized to other block
bootstrap procedures such as the stationary bootstrap of Politis and
Romano
(1994).
Compared to pwsd()
, hhj()
is more computationally intensive as it
relies on iterative sub-sampling processes that optimize the MSE
function over each possible block-length (or a select grid of
block-lengths), while pwsd()
is a simpler "plug-in" rule that uses
auto-correlations, auto-covariance, and the spectral density of the
series to optimize the choice of block-length. Similarly, nppi()
is
another "plug-in" rule, however, due to its heavy reliance on
resampling, it can also be computationally intensive compared to pwsd()
.
For a detailed comparison, see the table below:
# Comparison table table <- data.frame( rows = c( "**Method Type**", "**Computational Cost**", "**Primary Goal**", "**Variance Estimation**", "**Bias Estimation**", "**Best for**", "**Estimation Capacity**", "**Dependency\\***" ), NPPI = c( "Nonparametric resampling", "Medium (bootstrap resampling & jackknife)", "Minimize MSE of bootstrap estimator", "Moving Blocks Jackknife-After-Bootstrap (JAB)", "Directly estimates bias from bootstrap", "General-purpose estimators, small sample sizes, and quantile estimation", "Bootstrap bias, variance, distribution function, and quantile estimation", "User-defined parameters for initial block-length `l` and number of deletion blocks `m`" ), PWSD = c( "Spectral density estimation", "Low (direct ACF computation)", "Estimate block length using spectral density", "Implicitly estimated via spectral density", "Indirectly accounts for bias via ACF decay", "Block-length selection for circular and stationary bootstrap, time series with strong autocorrelation", "Bootstrap sample mean only", "User-defined parameters for autocorrelation lag and implied hypothesis tests (4 total)" ), HHJ = c( "Subsampling-based cross-validation", "High (subsampling & cross-validation)", "Minimize MSE via cross-validation", "Uses subsample-based variance estimation", "Uses subsample-based bias estimation", "Estimating functionals with strong dependencies", "Bootstrap variance and distribution function estimation", "Requires user-defined parameters for `pilot_block_length` (*l\\**) and `sub_sample` size (*m*)" ) ) # Render table knitr::kable( table, format = "markdown", col.names = c( "", "NPPI (Lahiri et al., 2007)", "PWSD (Politis & White, 2004)", "HHJ (Hall, Horowitz & Jing, 1995)" ), caption = "* All algorithms have default user-defined parameters recomended by the respective authors." )
You can install the released version from CRAN with:
install.packages("blocklength")
You can install the development version from GitHub with:
# install.packages("devtools") devtools::install_github("Alec-Stashevsky/blocklength")
We want to select the optimal block-length to perform a block bootstrap on a simulated autoregressive AR(1) time series.
First we will generate the time series:
library(blocklength) # Simulate AR(1) time series series <- stats::arima.sim(model = list(order = c(1, 0, 0), ar = 0.5), n = 500, rand.gen = rnorm) # Coerce time series to data.frame (not necessary) data <- data.frame("AR1" = series)
Now, we can find the optimal block-length to perform a block-bootstrap. We do this using the three available methods.
## Using the HHJ Algorithm with overlapping subsamples of width 10 hhj(series, sub_sample = 10, k = "bias/variance")
# Using Politis and White (2004) Spectral Density Estimation pwsd(data)
We can see that both methods produce similar results for a block-length of 9 or 11 depending on the type of bootstrap method used.
# Using Lahiri, Furukawa, and Lee (2007) Nonparametric Plug-In nppi(data, m = 8)
A big shoutout to Malina Cheeneebash for designing the blocklength
hex
sticker! Also to Sergio Armella and Simon P.
Couch for their help and feedback!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.