library("knitr") knitr::opts_knit$set(self.contained = FALSE) knitr::opts_chunk$set(tidy = TRUE, collapse=TRUE, comment = "#>", tidy.opts=list(blank=FALSE, width.cutoff=75))
The function SnoopingCV
looks up appropriate critical values adjusted for
bandwidth snooping, as described in @ArKo16snooping from a table of pre-computed
critical values. If, for a given kernel and order of a local polynomial no
critical value is found, the function computes an appropriate critical value
using Monte Carlo simulation as explained in the next section.
library("BWSnooping") SnoopingCV(bwratio=6.2, kernel="triangular", boundary=TRUE, order=1, alpha=0.01)
This gives appropriate 99% critical value for a regression discontinuity design
using a triangular kernel and local linear regression, with ratio of maximum to
minimum bandwidths equal to 6.2. The following call gives the critical value for
Nadarya-Watson (local constant) regression. Since values of bwratio
greater
than 100 are not pre-computed, it will be computed by simulation:
SnoopingCV(bwratio=102, kernel="triangular", boundary=FALSE, order=0, alpha=0.05)
This section describes the method used to tabulate the critical values based on Theorem 3.1 in @ArKo16snooping.
Let $k$ be a kernel (i.e.\ a non-negative function symmetric around zero that integrates to one) with bounded support normalized to $[-1,1]$. We want to compute the quantiles of \begin{equation}\label{eq:1} \sup_{1\leq h\leq t} \mathbb{H}(h)={d}\sup{1/t\leq s\leq 1}\mathbb{H}(s),\qquad\text{and}\quad \sup_{1\leq h\leq t} |\mathbb{H}(h)|={d}\sup{1/t\leq s\leq 1}|\mathbb{H}(s)| \end{equation} where $={d}$ means "equals in distribution", and $\mathbb{H}(h)$ is a mean-zero Gaussian process with covariance function $$ \cov(\mathbb{H}(h),\mathbb{H}(h'))=\frac{\int{-\infty}^{\infty} k(u/h)k(u/h') \, \dd u}{ \sqrt{\int_{-\infty}^{\infty} k(u/h)^{2}\, \dd u\cdot \int_{-\infty}^{\infty} k(u/h')^{2}\, \dd u} } =\frac{\int_{0}^{\infty} k(u/h)k(u/h') \, \dd u}{ \sqrt{hh'} \int_{0}^{\infty} k(u)^{2}\, \dd u}. $$
Let ${Y_{i}}{1\leq i\leq T}$ be independent samples from a standard normal distribution, where $T$ is the number of points sampled in each simulation draw. Let \begin{equation} \hat{\mathbb{H}}(s)=\frac{1/\sqrt{T} \sum_{i=1}^{T}Y_{i}k\left(i/sT\right)}{\sqrt{T^{-1} \sum_{i=1}^{T} k(i/sT)^2}}. \end{equation} Observe that $(\hat{\mathbb{H}}(s)){s\in (0,1]}$ is a centered Gaussian process with covariance function \begin{equation} \E[\hat{\mathbb{H}}(s)\hat{\mathbb{H}}(s')]= \frac{1/T\sum_{i=1}^{T}k(i/sT)k(i/s'T)}{ \sqrt{T^{-1} \sum_{i=1}^{T} k(i/sT)^2}\sqrt{T^{-1} \sum_{i=1}^{T} k(i/s'T)^2}}. \end{equation} As $T\to\infty$, this process converges to $\left(\mathbb{H}(s)\right)_{s\in (0,1]}$, since $\E[\hat{\mathbb{H}}(s)\hat{\mathbb{H}}(s')]\to \cov(\mathbb{H}(s),\mathbb{H}(s'))$.
We then approximate the quantiles of the two distributions in Equation \eqref{eq:1} by empirical quantiles based on $S$ simulation draws ${\hat{\mathbb{H}}{m}(h)}{m=1}^{S}$. The process $\hat{\mathbb{H}}_{m}(h)$ is evaluated on the log-grid \begin{equation} \exp(\log(t):\text{stepsize}:\log(1)). \end{equation}
If the kernel is uniform, then $\hat{\mathbb{H}}(h)$ simplifies to \begin{equation} \hat{\mathbb{H}}(h)=\frac{\sum_{i=1}^{\lfloor hT\rfloor}Y_{i}}{\sqrt{ \lfloor hT\rfloor}}. \end{equation} so that $\hat{\mathbb{H}}(h)$ is mean-zero Gaussian process with covariance function $\E[ \hat{\mathbb{H}}(h) \hat{\mathbb{H}}(h')]=\frac{\lfloor hT\rfloor\wedge \lfloor h'T\rfloor}{\sqrt{\lfloor hT\rfloor \lfloor h'T\rfloor}}$.
An alternative used in an earlier version of the paper is to replace $\hat{\mathbb{H}}(h)$ with \begin{equation} \tilde{\mathbb{H}}(h)=\frac{1/\sqrt{T} \sum_{i=1}^{T}Y_{i}k\left(i/hT\right)}{\sqrt{ h\int_{0}^{\infty} k(u)^{2}\, \dd u}} \end{equation} This method, however, has the disadvantage what for any finite $T$, the variance of the process $\tilde{\mathbb{H}}(h)$ is not equal to one exactly.
A third possible approach is to replace $\hat{\mathbb{H}}(h)$ with \begin{align} \hat{\mathbb{H}}(h) =\frac{\frac{1}{\sqrt{T}} \sum_{i=1}^{T}Y_{i}k(X_i/h)}{\sqrt{\frac{1}{2}\int k(u/h)^2\, du}} =\frac{\frac{1}{\sqrt{T}} \sum_{i=1}^{T}Y_{i}k(X_i/h)}{\sqrt{\frac{h}{2}\int k(u)^2\, du}}, \end{align} where ${(X_{i},Y_{i})}_{1\le i\le T}$ be i.i.d. with $X_i$ independent of $Y_i$ and $X_i\sim U[-1,1]$ and $Y_i\sim N(0,1)$. Note that $\hat{\mathbb{H}}(h)=0$ and \begin{align} \cov(\hat{\mathbb{H}}(h),\hat{\mathbb{H}}(h')) &=\frac{EY_{i}^2k(X_i/h)k(X_i/h') }{\sqrt{\frac{1}{2}\int k(u/h)^2\, du}\sqrt{\frac{1}{2}\int k(u/h')^2\, du}} \ & =\frac{Ek(X_i/h)k(X_i/h')}{\frac{1}{2}\sqrt{\int k(u/h)^2\, du}\sqrt{\int k(u/h')^2\, du}} =\frac{\frac{1}{2}\int_{x=-1}^{1} k(x/h)k(x/h')\, dx}{\frac{1}{2}\sqrt{\int k(u/h)^2\, du}\sqrt{\int k(u/h')^2\, du}} \ & =\frac{\int_{x=-\infty}^\infty k(x/h)k(x/h')\, dx}{\sqrt{\int k(u/h)^2\, du}\sqrt{\int k(u/h')^2\, du}} \end{align} for $h\le 1$ (the last step follows since, for $h\le 1$, the integral is over $-h\le x\le h$ for both the last and second to last line). Thus, $\hat{\mathbb{H}}(h)$ and $\mathbb{H}(h)$ have the same covariance function and, for large enough $T$, have approximately the same distribution.
Since the critical values at a boundary and interior points differ, this may create a non-uniformity issue for estimation problems in which the point of interest lies near a boundary point. To deal with this case, it is possible to extend the method in @ArKo16snooping by modeling the point of interest as being local to boundary, as in Chapter 3.2.5 of @FaGi96 (see Section S2.1 in the supplement to @ArKo16snooping).
In particular, for local polynomial estimation of $E[Y_{i}\mid X_{i}=x_{0}]$
where $x_{0}=c\underline{h}{n}$ and the lower support point of the density of
$X{i}$ is zero, the appropriate critical value can be computed using the
function TableSnoopingCVNearBd
. For convenience, the function reports a
table of critical values. For example, to calculate critical values for local
linear regression and bandwidth ratios $\overline{h}/\underline{h}$ equal to
$t\in{1, 2, 3}$ and the local parameter equal to $c\in{1,2,10}$, one calls
nb <- TableSnoopingCVNearBd(bwratios=c(1, 2, 3), kernel="triangular", db=c(1, 2, 10), order=1) knitr::kable(nb$table.twosided, row.names=FALSE, digits=2, caption="Local linear regression near a boundary") knitr::kable(nb$table.onesided, row.names=FALSE, digits=2, caption="Local linear regression near a boundary")
The tables and graphs in the paper can be reproduced using the functions
DFSnoopingCV
and SnoopingTablesGraphs
. DFSnoopingCV
computes a data frame
of critical values, and SnoopingTablesGraphs
reproduces tables and graphs
reported in the paper:
## The function DFSnoopingCV may long time to compute, the package has results ## stored for DFSnoopingCV(S=60000, T=10000, ngr=1000) ## under the data frame snoopingcvs r <- SnoopingTablesGraphs(snoopingcvs)
t1 <- subset(r$table.twosided, boundary==TRUE & order==0)[, -c(1, 2)] knitr::kable(t1, row.names=FALSE, digits=2, caption="Boundary Nadaraya-Watson regression") t2 <- subset(r$table.twosided, boundary==FALSE & order==0)[, -c(1, 2)] knitr::kable(t2, row.names=FALSE, digits=2, caption="Interior Nadaraya-Watson regression") t3 <- subset(r$table.twosided, boundary==TRUE & order==1)[, -c(1, 2)] knitr::kable(t3, row.names=FALSE, digits=2, caption="Boundary local linear regression") t4 <- subset(r$table.twosided, boundary==FALSE & order==1)[, -c(1, 2)] knitr::kable(t4, row.names=FALSE, digits=2, caption="Interior local linear regression") t5 <- subset(r$table.twosided, boundary==TRUE & order==2)[, -c(1, 2)] knitr::kable(t5, row.names=FALSE, digits=2, caption="Boundary local quadratic regression") t6 <- subset(r$table.twosided, boundary==FALSE & order==2)[, -c(1, 2)] knitr::kable(t6, row.names=FALSE, digits=2, caption="Interior local quadratic regression")
o1 <- subset(r$table.onesided, boundary==TRUE & order==0)[, -c(1, 2)] knitr::kable(o1, row.names=FALSE, digits=2, caption="Boundary Nadaraya-Watson regression") o2 <- subset(r$table.onesided, boundary==FALSE & order==0)[, -c(1, 2)] knitr::kable(o2, row.names=FALSE, digits=2, caption="Interior Nadaraya-Watson regression") o3 <- subset(r$table.onesided, boundary==TRUE & order==1)[, -c(1, 2)] knitr::kable(o3, row.names=FALSE, digits=2, caption="Boundary local linear regression") o4 <- subset(r$table.onesided, boundary==FALSE & order==1)[, -c(1, 2)] knitr::kable(o4, row.names=FALSE, digits=2, caption="Interior local linear regression") o5 <- subset(r$table.onesided, boundary==TRUE & order==2)[, -c(1, 2)] knitr::kable(o5, row.names=FALSE, digits=2, caption="Boundary local quadratic regression") o6 <- subset(r$table.onesided, boundary==FALSE & order==2)[, -c(1, 2)] knitr::kable(o6, row.names=FALSE, digits=2, caption="Interior local quadratic regression")
Critical values for local constant and local linear regression in the interior:
r$cv.interior
Critical values for local constant and local linear regression at the boundary:
r$cv.boundary
Coverage of unadjusted CIs for local constant and local linear regression in the interior:
r$cov.interior
Coverage of unadjusted CIs for local constant and local linear regression at the boundary:
r$cov.boundary
The package also provides a function EVSnoopingCV
that calculates critical
values based on a further extreme value approximation to the Gaussian process.
Its use is not recommended, however. The following figure compares these
critical values with those based directly on the Gaussian process.
Order "0" corresponds to Nadaraya-Watson or local linear regression in the interior, and order "1" to local linear regression at a boundary.
p <- PlotEVSnoopingCV() p+ggplot2::ylab("Critical value")+ggplot2::xlab("$\\overline{h}/\\underline{h}$")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.