find_hcv: Find optimal bandwidth using cross-validation

Description Usage Arguments Details Value Examples

View source: R/cv.R

Description

The function finds a data-driven optimal bandwidth using cross-validation. The output is an approximation to \hat{h}_{CV}, which is the minimum of the function

CV(h) = \frac{1}{n}∑_{i=1}^{n} (Y_i-\hat{m}^{(-i)}_h(x_i))^2

where \hat{m}^{(-i)}_h denotes the leave-one-out estimator.

Usage

1
find_hcv(data, estimator, hrange = c(0, 1), num_bws = 100, plot = T, ...)

Arguments

data

the data used to fit the estimator, a dataframe with columns x and y

estimator

the estimator (nw, local_average, or another function with the same input and return types)

hrange

a vector of length 2 specifying the range of h-values to try

num_bws

number of different h-values to try in the range (default 100)

plot

if set to TRUE, produces a plot of the CV function. Useful for checking that hrange is appropriate

...

additional arguments to pass to the estimator (e.g. kernel, empty_nhood)

Details

The function is minimized approximately via a grid search. The user specifies an interval over which to search for h. The function then constructs a sequence of evenly-spaced trial bandwidths in that interval, of length num_bws, and computes CV(h) for each. The best of these trial bandwidths is reported.

By default, the function also produces a plot of the function CV(h). This enables the user to check that a suitable interval has been specified. The interval should be wide enough that it is clear that the identified point is a minimum. However, if it is too wide then the discretization error from the grid search may be substantial.

Value

A list with 4 components:

hcv

the identified optimal bandwidth

mincv

the minimal value of CV(h)

h

the vector of bandwidths that have been tried

cvs

the values of CV(h) for the trial bandwidths

Examples

1
2
3
4
5
6
7
  # simulate and plot some data
  m <- function(x) (x^2+1)*sin(2*pi*x*((1-x) + 4*x))
  x <- sort(runif(100))
  y <- m(x) + rnorm(length(x), sd=0.1)
  simdata <- data.frame(x=x,y=y)

  find_hcv(simdata, nw, c(0,0.4))

timwaite/nprtw documentation built on Jan. 25, 2021, 1:50 a.m.