best_h: Find "best" smoothing parameter using leave-one-out cross...

Description Usage Arguments Details Value See Also Examples

View source: R/h.r

Description

Minimises the leave-one-out estimate of root mean-squared error to find find the "optimal" bandwidth for smoothing.

Usage

1
best_h(x, h_init = NULL, ..., tol = 0.01, control = list())

Arguments

x

condensed summary to smooth

h_init

initial values of bandwidths to start search out. If not specified defaults to 5 times the binwidth of each variable.

...

other arguments (like var) passed on to rmse_cv

tol

numerical tolerance, defaults to 1%.

control

additional control parameters passed on to optim The most useful argument is probably trace, which makes it possible to follow the progress of the optimisation.

Details

L-BFGS-B optimisation is used to constrain the bandwidths to be greater than the binwidths: if the bandwidth is smaller than the binwidth it's impossible to compute the rmse because no smoothing occurs. The tolerance is set relatively high for numerical optimisation since the precise choice of bandwidth makes little difference visually, and we're unlikely to have sufficient data to make a statistically significant choice anyway.

Value

a single numeric value representing the bandwidth that minimises the leave-one-out estimate of rmse. Vector has attributes evaluations giving the number of times the objective function was evaluated. If the optimisation does not converge, or smoothing is not needed (i.e. the estimate is on the lower bounds), a warning is thrown.

See Also

Other bandwidth estimation functions: h_grid; rmse_cv, rmse_cvs

Examples

1
2
3
4
5
6
7
8
x <- rchallenge(1e4)
xsum <- condense(bin(x, 1 / 10))
h <- best_h(xsum, control = list(trace = 3, REPORT = 1))

if (require("ggplot2")) {
autoplot(xsum)
autoplot(smooth(xsum, h))
}

hadley/bigvis documentation built on May 17, 2019, 9:45 a.m.