# Cross-validation methods for bandwidth selection

### Description

Selects the bandwidth of a local polynomial kernel (regression, density or variogram) estimator using (standart or modified) CV, GCV or MASE criteria.

### Usage

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | ```
h.cv(bin, ...)
## S3 method for class 'bin.data'
h.cv(bin, objective = c("CV", "GCV", "MASE"),
h.start = NULL, h.lower = NULL, h.upper = NULL, degree = 1,
ncv = ifelse(objective == "CV", 2, 0), cov.bin = NULL,
DEalgorithm = FALSE, warn = TRUE, tol.mask = npsp.tolerance(2), ...)
## S3 method for class 'bin.den'
h.cv(bin, h.start = NULL, h.lower = NULL,
h.upper = NULL, degree = 1, ncv = 2, DEalgorithm = FALSE, ...)
## S3 method for class 'svar.bin'
h.cv(bin, loss = c("ARSE", "ARAE", "ASE", "AAE"),
h.start = NULL, h.lower = NULL, h.upper = NULL, degree = 1, ncv = 1,
DEalgorithm = FALSE, warn = FALSE, ...)
hcv.data(bin, objective = c("CV", "GCV", "MASE"), h.start = NULL,
h.lower = NULL, h.upper = NULL, degree = 1, ncv = ifelse(objective ==
"CV", 1, 0), cov.dat = NULL, DEalgorithm = FALSE, warn = TRUE, ...)
``` |

### Arguments

`bin` |
object used to select a method (binned data, binned density or binned semivariogram). |

`...` |
further arguments passed to or from other methods (e.g. parameters of the optimization routine). |

`objective` |
character; optimal criterion to be used ("CV", "GCV" or "MASE"). |

`h.start` |
vector; initial values for the parameters (diagonal elements) to be optimized over.
If |

`h.lower` |
vector; lower bounds on each parameter (diagonal elements) to be optimized.
Defaults to |

`h.upper` |
vector; upper bounds on each parameter (diagonal elements) to be optimized.
Defaults to |

`degree` |
degree of the local polynomial used. Defaults to 1 (local linear estimation). |

`ncv` |
integer; determines the number of cells leaved out in each dimension.
(0 to GCV considering all the data, |

`cov.bin` |
(optional) covariance matrix of the binned data or semivariogram model
( |

`DEalgorithm` |
logical; if |

`warn` |
logical; sets the handling of warning messages
(normally due to the lack of data in some neighborhoods).
If |

`tol.mask` |
tolerance used in the aproximations. Defaults to |

`loss` |
character; CV error. See "Details" bellow. |

`cov.dat` |
covariance matrix of the data or semivariogram model
(of class extending |

### Details

Currently, only diagonal bandwidths are supported.

`h.cv`

methods use binning approximations to the objective function values
(in almost all cases, an averaged squared error).
If `ncv > 0`

, estimates are computed by leaving out binning cells with indexes within
the intervals *[x_i - ncv + 1, x_i + ncv - 1]*, at each dimension i, where *x*
denotes the index of the estimation location. *ncv = 1* corresponds with
traditional cross-validation and *ncv > 1* with modified CV
(it may be appropriate for dependent data; see e.g. Chu and Marron, 1991, for the one dimensional case).
Setting `ncv >= 2`

would be recommended for sparse data (as linear binning is used).
For standard GCV, set `ncv = 0`

(the whole data would be used).
For theoretical MASE, set `bin = binning(x, y = trend.teor)`

, `cov = cov.teor`

and `ncv = 0`

.

If `DEalgorithm == FALSE`

, the `"L-BFGS-B"`

method in `optim`

is used.

The different options for the argument `loss`

in `h.cv.svar.bin()`

define the CV error
considered in semivariogram estimation:

`"ASE"`

Averaged squared error

`"ARSE"`

Averaged relative squared error

`"AAE"`

Averaged absolute error

`"ARAE"`

Averaged relative absolute error

`hcv.data`

evaluates the objective function at the original data
(combining a binning approximation to the nonparametric estimates with a linear interpolation),
this can be very slow (and memory demanding; consider using `h.cv`

instead).
If `ncv > 1`

(modified CV), a similar algorithm to that in `h.cv`

is used,
estimates are computed by leaving out binning cells with indexes within
the intervals *[x_i - ncv + 1, x_i + ncv - 1]*.

### Value

Returns a list containing the following 3 components:

`h` |
the best (diagonal) bandwidth matrix found. |

`value` |
the value of the objective function corresponding to |

`objective` |
the criterion used. |

### References

Chu, C.K. and Marron, J.S. (1991) Comparison of Two Bandwidth Selectors
with Dependent Errors. *The Annals of Statistics*, **19**, 1906-1918.

Francisco-Fernandez M. and Opsomer J.D. (2005) Smoothing parameter selection
methods for nonparametric regression with spatially correlated errors.
*Canadian Journal of Statistics*, **33**, 539-558.

### See Also

`locpol`

, `locpolhcv`

, `binning`

,
`np.svar`

.

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | ```
bin <- binning(earthquakes[, c("lon", "lat")], earthquakes$mag)
hcv <- h.cv(bin, ncv = 2)
lp <- locpol(bin, h = hcv$h)
## Alternatively:
## lp <- locpolhcv(earthquakes[, c("lon", "lat")], earthquakes$mag, ncv = 2)
simage(lp, main = 'Smoothed magnitude')
contour(lp, add = TRUE)
with(earthquakes, points(lon, lat, pch = 20))
## Density estimation
hden <- h.cv(as.bin.den(bin))
den <- np.den(bin, h = hden$h)
plot(den, main = 'Estimated log(density)')
``` |