lrv: Long Run Variance
In robcp: Robust Change-Point Tests

View source: R/lrv.R

lrv	R Documentation

Long Run Variance

Description

Estimates the long run variance respectively covariance matrix of the supplied time series.

Usage

lrv(x, method = c("kernel", "subsampling", "bootstrap", "none"), control = list())

Arguments

`x`	vector or matrix with each column representing a time series (numeric).
`method`	method of estimation. Options are `kernel`, `subsampling`, `bootstrap` and `none`.
`control`	a list of control parameters. See 'Details'.

Details

The long run variance equals the limit of n times the variance of the arithmetic mean of a short range dependent time series, where n is the length of the time series. It is used to standardize tests concering the mean on dependent data.

If method = "none", no long run variance estimation is performed and the value 1 is returned (i.e. it does not alterate the test statistic).

The control argument is a list that can supply any of the following components:

kFun: Kernel function (character string). More in 'Notes'.
b_n: Bandwidth (numeric > 0 and smaller than sample size).
gamma0: Only use estimated variance if estimated long run variance is < 0? Boolean.
l: Block length (numeric > 0 and smaller than sample size).
overlapping: Overlapping subsampling estimation? Boolean.
distr: Tranform observations by their empirical distribution function? Boolean. Default is FALSE.
B: Bootstrap repetitions (integer).
seed: RNG seed (numeric).
version: What property does the CUSUM test test for? Character string, details below.
loc: Estimated location corresponding to version. Numeric value, details below.
scale: Estimated scale corresponding to version. Numeric value, details below.

Kernel-based estimation

The kernel-based long run variance estimation is available for various testing scenarios (set by control$version) and both for one- and multi-dimensional data. It uses the bandwidth b_n = control$b_n and kernel function k(x) = control$kFun. For tests on certain properties also a corresponding location control$loc (m_n) and scale control$scale (v_n) estimation needs to be supplied. Supported testing scenarios are:

"mean"
- 1-dim. data:
  
  \hat{\sigma}^2 = \frac{1}{n} \sum_{i = 1}^n (x_i - \bar{x})^2 + \frac{2}{n} \sum_{h = 1}^{b_n} \sum_{i = 1}^{n - h} (x_i - \bar{x}) (x_{i + h} - \bar{x}) k(h / b_n).
  
  If control$distr = TRUE, then the long run variance is estimated on the empirical distribution of x. The resulting value is then multiplied with \sqrt{\pi} / 2.
  
  Default values: b_n = 0.9 n^{1/3}, kFun = "bartlett".
- multivariate time series: The k,l-element of \Sigma is estimated by
  
  \hat{\Sigma}^{(k,l)} = \frac{1}{n} \sum_{i,j = 1}^{n}(x_i^{(k)} - \bar{x}^{(k)}) (x_j^{(l)} - \bar{x}^{(l)}) k((i-j) / b_n),
  
  k, l = 1, ..., m.
  
  Default values: b_n = \log_{1.8 + m / 40}(n / 50), kFun = "bartlett".
"empVar" for tests on changes in the empirical variance.

\hat{\sigma}^2 = \sum_{h = -(n-1)}^{n-1} W \left( \frac{|h|}{b_n} \right) \frac{1}{n} \sum_{i = 1}^{n - |h|} ((x_i - m_n)^2 - v_n)((x_{i+|h|} - m_n)^2 - v_n).

Default values: m_n = mean(x), v_n = var(x).
"MD" for tests on a change in the median deviation.

\hat{\sigma}^2 = \sum_{h = -(n-1)}^{n-1} W \left( \frac{|h|}{b_n} \right) \frac{1}{n} \sum_{i = 1}^{n - |h|} (|x_i - m_n| - v_n)(|x_{i+|h|} - m_n| - v_n).

Default values: m_n = median(x), v_n = \frac{1}{n-1} \sum_{i = 1}^n |x_i - m_n|.
"GMD" for tests on changes in Gini's mean difference.

\hat{\sigma}^2 = 4 \sum_{h = -(n-1)}^{n-1} W \left( \frac{|h|}{b_n} \right) \frac{1}{n} \sum_{i = 1}^{n - |h|} \hat{\phi}_n(x_i)\hat{\phi}_n(x_{i+|h|})

with \hat{\phi}_n(x) = n^{-1} \sum_{i = 1}^n |x - x_i| - v_n.

Default value: v_n = \frac{2}{n(n-1)} \sum_{1 \leq i < j \leq n} |x_i - x_j|.
"Qalpha" for tests on changes in Qalpha.

\hat{\sigma}^2 = \frac{4}{\hat{u}(v_n)} \sum_{h = -(n-1)}^{n-1} W \left( \frac{|h|}{b_n} \right) \frac{1}{n} \sum_{i = 1}^{n - |h|} \hat{\phi}_n(x_i)\hat{\phi}_n(x_{i+|h|}),

where \hat{\phi}_n(x) = n^{-1} \sum_{i = 1}^n 1_{\{|x - x_i| \leq v_n\}} - m_n and

\hat{u}(t) = \frac{2}{n(n-1)h_n} \sum_{1 \leq i < j \leq n} K\left(\frac{|x_i - x_j| - t}{h_n}\right)

the kernel density estimation of the densitiy u corresponding to the distribution function U(t) = P(|X-Y| \leq t), h_n = IQR(x)n^{-\frac{1}{3}} and K is the quatratic kernel function.

Default values: m_n = \alpha = 0.5, v_n = Qalpha(x, m_n)[n-1].
"tau" for tests in changes in Kendall's tau.

Only available for bivariate data: assume that the given data x has the format (x_i, y_i)_{i = 1, ..., n}.

\hat{\sigma}^2 = \sum_{h = -(n-1)}^{n-1} W \left( \frac{|h|}{b_n} \right) \frac{1}{n} \sum_{i = 1}^{n - |h|} \hat{\phi}_n((x_i, y_i))\hat{\phi}_n((x_{i+|h|}, y_{i+|h|}),

where \hat{\phi}_n(x) = 4 F_n(x, y) - 2F_{X,n}(x) 2 - F_{Y,n}(y) + 1 - v_n and F_n, F_{X,n} and F_{Y,n} are the empirical distribution functions of ((X_i, Y_i))_{i = 1, ..., n}, (X_i)_{i = 1, ..., n} and (Y_i)_{i = 1, ..., n}.

Default value: v_n = \hat{\tau}_n = \frac{2}{n(n-1)} \sum_{1 \leq i < j \leq n} sign\left((x_j - x_i)(y_j - y_i)\right).
"rho" for tests on changes in Spearman's rho.

Only availabe for d-variate data with d > 1: assume that the given data x has the format (x_{i,j} | i = 1, ..., n; j = 1, ..., d).

\hat{\sigma}^2 = a(d)^2 2^{2d} \left\{ \sum_{h = -(n-1)}^{n-1} K\left( \frac{|h|}{b_n} \right) \left( \sum_{i = 1}^{n-|h|} n^{-1} \prod_{j = 1}^d \hat{\phi}_n(x_i, x_j) \hat{\phi}_n(x_{i+|h|}, x_j) - M^2 \right) \right\} ,

where a(d) = (d+1) / (2^d - d - 1), M = n^{-1} \sum_{i = 1}^n \prod_{j = 1}^d \hat{\phi}_n(x_i, x_j) and \hat{\phi}_n(x, y) = 1 - \hat{U}_n(x, y), \hat{U}_n(x, y) = n^{-1} (rank of x_{i,j} in x_{i,1}, ..., x_{i,n}).

When control$gamma0 = TRUE (default) then negative estimates of the long run variance are replaced by the autocovariance at lag 0 (= ordinary variance of the data). The function will then throw a warning.

Subsampling estimation

For method = "subsampling" there are an overlapping and a non-overlapping version (parameter control$overlapping). Also it can be specified if the observations x were transformed by their empirical distribution function \tilde{F}_n (parameter control$distr). Via control$l the block length l can be controlled.

If control$overlapping = TRUE and control$distr = TRUE:

\hat{\sigma}_n = \frac{\sqrt{\pi}}{\sqrt{2l}(n - l + 1)} \sum_{i = 0}^{n-l} \left| \sum_{j = i+1}^{i+l} (F_n(x_j) - 0.5) \right|.

Otherwise, if control$distr = FALSE, the estimator is

\hat{\sigma}^2 = \frac{1}{l (n - l + 1)} \sum_{i = 0}^{n-l} \left( \sum_{j = i + 1}^{i+l} x_j - \frac{l}{n} \sum_{j = 1}^n x_j \right)^2.

If control$overlapping = FALSE and control$distr = TRUE:

\hat{\sigma} = \frac{1}{n/l} \sqrt{\pi/2} \sum_{i = 1}{n/l} \frac{1}{\sqrt{l}} \left| \sum_{j = (i-1)l + 1}^{il} F_n(x_j) - \frac{l}{n} \sum_{j = 1}^n F_n(x_j) \right|.

Otherwise, if control$distr = FALSE, the estimator is

\hat{\sigma}^2 = \frac{1}{n/l} \sum_{i = 1}^{n/l} \frac{1}{l} \left(\sum_{j = (i-1)l + 1}^{il} x_j - \frac{l}{n} \sum_{j = 1}^n x_j\right)^2.

Default values: overlapping = TRUE, the block length is chosen adaptively:

l_n = \max{\left\{ \left\lceil n^{1/3} \left( \frac{2 \rho}{1 - \rho^2} \right)^{(2/3)} \right\rceil, 1 \right\}}

where \rho is the Spearman autocorrelation at lag 1.

Bootstrap estimation

If method = "bootstrap" a dependent wild bootstrap with the parameters B = control$B, l = control$l and k(x) = control$kFun is performed:

\hat{\sigma}^2 = \sqrt{n} Var(\bar{x^*_k} - \bar{x}), k = 1, ..., B

A single x_{ik}^* is generated by x_i^* = \bar{x} + (x_i - \bar{x}) a_i where a_i are independent from the data x and are generated from a multivariate normal distribution with E(A_i) = 0, Var(A_i) = 1 and Cov(A_i, A_j) = k\left(\frac{i - j}{l}\right), i = 1, ..., n; j \neq i. Via control$seed a seed can optionally be specified (cf. set.seed). Only "bartlett", "parzen" and "QS" are supported as kernel functions. Uses the function sqrtm from package pracma.

Default values: B = 1000, kFun = "bartlett", l is the same as for subsampling.

Value

long run variance \sigma^2 (numeric) resp. \Sigma (numeric matrix)

Note

Kernel functions

bartlett:

k(x) = (1 - |x|) * 1_{\{|x| < 1\}}

FT:

k(x) = 1 * 1_{\{|x| \leq 0.5\}} + (2 - 2 * |x|) * 1_{\{0.5 < |x| < 1\}}

parzen:

k(x) = (1 - 6x^2 + 6|x|^3) * 1_{\{0 \leq |x| \leq 0.5\}} + 2(1 - |x|)^3 * 1_{\{0.5 < |x| \leq 1\}}

QS:

k(x) = \frac{25}{12 \pi ^2 x^2} \left(\frac{\sin(6\pi x / 5)}{6\pi x / 5} - \cos(6 \pi x / 5)\right)

TH:

k(x) = (1 + \cos(\pi x)) / 2 * 1_{\{|x| < 1\}}

truncated:

k(x) = 1_{\{|x| < 1\}}

SFT:

k(x) = (1 - 4(|x| - 0.5)^2)^2 * 1_{\{|x| < 1\}}

Epanechnikov:

k(x) = 3 \frac{1 - x^2}{4} * 1_{\{|x| < 1\}}

quatratic:

k(x) = (1 - x^2)^2 * 1_{\{|x| < 1\}}

Author(s)

Sheila Görz

References

Andrews, D.W. "Heteroskedasticity and autocorrelation consistent covariance matrix estimation." Econometrica: Journal of the Econometric Society (1991): 817-858.

Dehling, H., et al. "Change-point detection under dependence based on two-sample U-statistics." Asymptotic laws and methods in stochastics. Springer, New York, NY, (2015). 195-220.

Dehling, H., Fried, R., and Wendler, M. "A robust method for shift detection in time series." Biometrika 107.3 (2020): 647-660.

Parzen, E. "On consistent estimates of the spectrum of a stationary time series." The Annals of Mathematical Statistics (1957): 329-348.

Shao, X. "The dependent wild bootstrap." Journal of the American Statistical Association 105.489 (2010): 218-235.

Examples

Z <- c(rnorm(20), rnorm(20, 2))

## kernel density estimation
lrv(Z)

## overlapping subsampling
lrv(Z, method = "subsampling", control = list(overlapping = FALSE, distr = TRUE, l_n = 5))

## dependent wild bootstrap estimation
lrv(Z, method = "bootstrap", control = list(l_n = 5, kFun = "parzen"))

robcp documentation built on April 11, 2025, 6:18 p.m.