In kolesarm/BWSnooping: Confidence invervals robust to bandwidth snooping

library("knitr")
knitr::opts_knit$set(self.contained = FALSE)
knitr::opts_chunk$set(tidy = TRUE, collapse=TRUE, comment = "#>",
                      tidy.opts=list(blank=FALSE, width.cutoff=75))

Looking up and computing snooping-adjusted critical values

The function SnoopingCV looks up appropriate critical values adjusted for bandwidth snooping, as described in @ArKo16snooping from a table of pre-computed critical values. If, for a given kernel and order of a local polynomial no critical value is found, the function computes an appropriate critical value using Monte Carlo simulation as explained in the next section.

library("BWSnooping")
SnoopingCV(bwratio=6.2, kernel="triangular", boundary=TRUE, order=1, alpha=0.01)

This gives appropriate 99% critical value for a regression discontinuity design using a triangular kernel and local linear regression, with ratio of maximum to minimum bandwidths equal to 6.2. The following call gives the critical value for Nadarya-Watson (local constant) regression. Since values of bwratio greater than 100 are not pre-computed, it will be computed by simulation:

SnoopingCV(bwratio=102, kernel="triangular", boundary=FALSE, order=0, alpha=0.05)

Calculation of critical values

This section describes the method used to tabulate the critical values based on Theorem 3.1 in @ArKo16snooping.

Let $k$ be a kernel (i.e.\ a non-negative function symmetric around zero that integrates to one) with bounded support normalized to $[-1,1]$. We want to compute the quantiles of \begin{equation}\label{eq:1} \sup_{1\leq h\leq t} \mathbb{H}(h)={d}\sup{1/t\leq s\leq 1}\mathbb{H}(s),\qquad\text{and}\quad \sup_{1\leq h\leq t} |\mathbb{H}(h)|={d}\sup{1/t\leq s\leq 1}|\mathbb{H}(s)| \end{equation} where $={d}$ means "equals in distribution", and $\mathbb{H}(h)$ is a mean-zero Gaussian process with covariance function $$ \cov(\mathbb{H}(h),\mathbb{H}(h'))=\frac{\int{-\infty}^{\infty} k(u/h)k(u/h') \, \dd u}{ \sqrt{\int_{-\infty}^{\infty} k(u/h)^{2}\, \dd u\cdot \int_{-\infty}^{\infty} k(u/h')^{2}\, \dd u} } =\frac{\int_{0}^{\infty} k(u/h)k(u/h') \, \dd u}{ \sqrt{hh'} \int_{0}^{\infty} k(u)^{2}\, \dd u}. $$

Let ${Y_{i}}{1\leq i\leq T}$ be independent samples from a standard normal distribution, where $T$ is the number of points sampled in each simulation draw. Let \begin{equation} \hat{\mathbb{H}}(s)=\frac{1/\sqrt{T} \sum_{i=1}^{T}Y_{i}k\left(i/sT\right)}{\sqrt{T^{-1} \sum_{i=1}^{T} k(i/sT)^2}}. \end{equation} Observe that $(\hat{\mathbb{H}}(s)){s\in (0,1]}$ is a centered Gaussian process with covariance function \begin{equation} \E[\hat{\mathbb{H}}(s)\hat{\mathbb{H}}(s')]= \frac{1/T\sum_{i=1}^{T}k(i/sT)k(i/s'T)}{ \sqrt{T^{-1} \sum_{i=1}^{T} k(i/sT)^2}\sqrt{T^{-1} \sum_{i=1}^{T} k(i/s'T)^2}}. \end{equation} As $T\to\infty$, this process converges to $\left(\mathbb{H}(s)\right)_{s\in (0,1]}$, since $\E[\hat{\mathbb{H}}(s)\hat{\mathbb{H}}(s')]\to \cov(\mathbb{H}(s),\mathbb{H}(s'))$.

We then approximate the quantiles of the two distributions in Equation \eqref{eq:1} by empirical quantiles based on $S$ simulation draws ${\hat{\mathbb{H}}{m}(h)}{m=1}^{S}$. The process $\hat{\mathbb{H}}_{m}(h)$ is evaluated on the log-grid \begin{equation} \exp(\log(t):\text{stepsize}:\log(1)). \end{equation}

Example: uniform kernel

If the kernel is uniform, then $\hat{\mathbb{H}}(h)$ simplifies to \begin{equation} \hat{\mathbb{H}}(h)=\frac{\sum_{i=1}^{\lfloor hT\rfloor}Y_{i}}{\sqrt{ \lfloor hT\rfloor}}. \end{equation} so that $\hat{\mathbb{H}}(h)$ is mean-zero Gaussian process with covariance function $\E[ \hat{\mathbb{H}}(h) \hat{\mathbb{H}}(h')]=\frac{\lfloor hT\rfloor\wedge \lfloor h'T\rfloor}{\sqrt{\lfloor hT\rfloor \lfloor h'T\rfloor}}$.

Note: Alternative approaches

An alternative used in an earlier version of the paper is to replace $\hat{\mathbb{H}}(h)$ with \begin{equation} \tilde{\mathbb{H}}(h)=\frac{1/\sqrt{T} \sum_{i=1}^{T}Y_{i}k\left(i/hT\right)}{\sqrt{ h\int_{0}^{\infty} k(u)^{2}\, \dd u}} \end{equation} This method, however, has the disadvantage what for any finite $T$, the variance of the process $\tilde{\mathbb{H}}(h)$ is not equal to one exactly.

A third possible approach is to replace $\hat{\mathbb{H}}(h)$ with \begin{align} \hat{\mathbb{H}}(h) =\frac{\frac{1}{\sqrt{T}} \sum_{i=1}^{T}Y_{i}k(X_i/h)}{\sqrt{\frac{1}{2}\int k(u/h)^2\, du}} =\frac{\frac{1}{\sqrt{T}} \sum_{i=1}^{T}Y_{i}k(X_i/h)}{\sqrt{\frac{h}{2}\int k(u)^2\, du}}, \end{align} where ${(X_{i},Y_{i})}_{1\le i\le T}$ be i.i.d. with $X_i$ independent of $Y_i$ and $X_i\sim U[-1,1]$ and $Y_i\sim N(0,1)$. Note that $\hat{\mathbb{H}}(h)=0$ and \begin{align} \cov(\hat{\mathbb{H}}(h),\hat{\mathbb{H}}(h')) &=\frac{EY_{i}^2k(X_i/h)k(X_i/h') }{\sqrt{\frac{1}{2}\int k(u/h)^2\, du}\sqrt{\frac{1}{2}\int k(u/h')^2\, du}} \ & =\frac{Ek(X_i/h)k(X_i/h')}{\frac{1}{2}\sqrt{\int k(u/h)^2\, du}\sqrt{\int k(u/h')^2\, du}} =\frac{\frac{1}{2}\int_{x=-1}^{1} k(x/h)k(x/h')\, dx}{\frac{1}{2}\sqrt{\int k(u/h)^2\, du}\sqrt{\int k(u/h')^2\, du}} \ & =\frac{\int_{x=-\infty}^\infty k(x/h)k(x/h')\, dx}{\sqrt{\int k(u/h)^2\, du}\sqrt{\int k(u/h')^2\, du}} \end{align} for $h\le 1$ (the last step follows since, for $h\le 1$, the integral is over $-h\le x\le h$ for both the last and second to last line). Thus, $\hat{\mathbb{H}}(h)$ and $\mathbb{H}(h)$ have the same covariance function and, for large enough $T$, have approximately the same distribution.

Regression near a boundary point

Since the critical values at a boundary and interior points differ, this may create a non-uniformity issue for estimation problems in which the point of interest lies near a boundary point. To deal with this case, it is possible to extend the method in @ArKo16snooping by modeling the point of interest as being local to boundary, as in Chapter 3.2.5 of @FaGi96 (see Section S2.1 in the supplement to @ArKo16snooping).

In particular, for local polynomial estimation of $E[Y_{i}\mid X_{i}=x_{0}]$ where $x_{0}=c\underline{h}{n}$ and the lower support point of the density of $X{i}$ is zero, the appropriate critical value can be computed using the function TableSnoopingCVNearBd. For convenience, the function reports a table of critical values. For example, to calculate critical values for local linear regression and bandwidth ratios $\overline{h}/\underline{h}$ equal to $t\in{1, 2, 3}$ and the local parameter equal to $c\in{1,2,10}$, one calls

nb <- TableSnoopingCVNearBd(bwratios=c(1, 2, 3), kernel="triangular", db=c(1, 2, 10), order=1)

knitr::kable(nb$table.twosided, row.names=FALSE, digits=2,
    caption="Local linear regression near a boundary")
knitr::kable(nb$table.onesided, row.names=FALSE, digits=2,
    caption="Local linear regression near a boundary")

Tables and graphs of critical values

The tables and graphs in the paper can be reproduced using the functions DFSnoopingCV and SnoopingTablesGraphs. DFSnoopingCV computes a data frame of critical values, and SnoopingTablesGraphs reproduces tables and graphs reported in the paper:

## The function DFSnoopingCV may long time to compute, the package has results
## stored for DFSnoopingCV(S=60000, T=10000, ngr=1000)
## under the data frame snoopingcvs

r <- SnoopingTablesGraphs(snoopingcvs)

Tables for two-sided critical values

For Nadarya-Watson regression, boundary and interior critical values coincide
In the tables, "u" stands for uniform, "e" for Epanechnikov, and "t" for triangular kernel, so that "0.9u" corresponds to 90\%-level critical value for the uniform kernel, "0.99t" corresponds to 99\%-level critical value for the triangular kernel, and so on.

t1 <- subset(r$table.twosided, boundary==TRUE & order==0)[, -c(1, 2)]
knitr::kable(t1, row.names=FALSE, digits=2,
    caption="Boundary Nadaraya-Watson regression")

t2 <- subset(r$table.twosided, boundary==FALSE & order==0)[, -c(1, 2)]
knitr::kable(t2, row.names=FALSE, digits=2,
    caption="Interior Nadaraya-Watson regression")

t3 <- subset(r$table.twosided, boundary==TRUE & order==1)[, -c(1, 2)]
knitr::kable(t3, row.names=FALSE, digits=2,
    caption="Boundary local linear regression")

t4 <- subset(r$table.twosided, boundary==FALSE & order==1)[, -c(1, 2)]
knitr::kable(t4, row.names=FALSE, digits=2,
    caption="Interior local linear regression")

t5 <- subset(r$table.twosided, boundary==TRUE & order==2)[, -c(1, 2)]
knitr::kable(t5, row.names=FALSE, digits=2,
    caption="Boundary local quadratic regression")

t6 <- subset(r$table.twosided, boundary==FALSE & order==2)[, -c(1, 2)]
knitr::kable(t6, row.names=FALSE, digits=2,
    caption="Interior local quadratic regression")

Tables for one-sided critical values

o1 <- subset(r$table.onesided, boundary==TRUE & order==0)[, -c(1, 2)]
knitr::kable(o1, row.names=FALSE, digits=2,
    caption="Boundary Nadaraya-Watson regression")

o2 <- subset(r$table.onesided, boundary==FALSE & order==0)[, -c(1, 2)]
knitr::kable(o2, row.names=FALSE, digits=2,
    caption="Interior Nadaraya-Watson regression")

o3 <- subset(r$table.onesided, boundary==TRUE & order==1)[, -c(1, 2)]
knitr::kable(o3, row.names=FALSE, digits=2,
    caption="Boundary local linear regression")

o4 <- subset(r$table.onesided, boundary==FALSE & order==1)[, -c(1, 2)]
knitr::kable(o4, row.names=FALSE, digits=2,
    caption="Interior local linear regression")

o5 <- subset(r$table.onesided, boundary==TRUE & order==2)[, -c(1, 2)]
knitr::kable(o5, row.names=FALSE, digits=2,
    caption="Boundary local quadratic regression")

o6 <- subset(r$table.onesided, boundary==FALSE & order==2)[, -c(1, 2)]
knitr::kable(o6, row.names=FALSE, digits=2,
    caption="Interior local quadratic regression")

Graphs

Critical values for local constant and local linear regression in the interior:

r$cv.interior

Critical values for local constant and local linear regression at the boundary:

r$cv.boundary

Coverage of unadjusted CIs for local constant and local linear regression in the interior:

r$cov.interior

Coverage of unadjusted CIs for local constant and local linear regression at the boundary:

r$cov.boundary

Critical values based on extreme value approximation

The package also provides a function EVSnoopingCV that calculates critical values based on a further extreme value approximation to the Gaussian process. Its use is not recommended, however. The following figure compares these critical values with those based directly on the Gaussian process.

Order "0" corresponds to Nadaraya-Watson or local linear regression in the interior, and order "1" to local linear regression at a boundary.

p <- PlotEVSnoopingCV()
p+ggplot2::ylab("Critical value")+ggplot2::xlab("$\\overline{h}/\\underline{h}$")

References

kolesarm/BWSnooping documentation built on May 20, 2019, 12:54 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

kolesarm/BWSnooping
Confidence invervals robust to bandwidth snooping

In kolesarm/BWSnooping: Confidence invervals robust to bandwidth snooping

Looking up and computing snooping-adjusted critical values

Calculation of critical values

Example: uniform kernel

Note: Alternative approaches

Regression near a boundary point

Tables and graphs of critical values

Tables for two-sided critical values

Tables for one-sided critical values

Graphs

Critical values based on extreme value approximation

References

R Package Documentation

Browse R Packages

We want your feedback!

kolesarm/BWSnooping Confidence invervals robust to bandwidth snooping

In kolesarm/BWSnooping: Confidence invervals robust to bandwidth snooping

Looking up and computing snooping-adjusted critical values

Calculation of critical values

Example: uniform kernel

Note: Alternative approaches

Regression near a boundary point

Tables and graphs of critical values

Tables for two-sided critical values

Tables for one-sided critical values

Graphs

Critical values based on extreme value approximation

References

R Package Documentation

Browse R Packages

We want your feedback!

kolesarm/BWSnooping
Confidence invervals robust to bandwidth snooping