best_h: Find "best" smoothing parameter using leave-one-out cross...
In hadley/bigvis: Tools for visualisation of big data sets

Description Usage Arguments Details Value See Also Examples

Minimises the leave-one-out estimate of root mean-squared error to find find the "optimal" bandwidth for smoothing.

1	best_h(x, h_init = NULL, ..., tol = 0.01, control = list())

`x`	condensed summary to smooth
`h_init`	initial values of bandwidths to start search out. If not specified defaults to 5 times the binwidth of each variable.
`...`	other arguments (like `var`) passed on to `rmse_cv`
`tol`	numerical tolerance, defaults to 1%.
`control`	additional control parameters passed on to `optim` The most useful argument is probably trace, which makes it possible to follow the progress of the optimisation.

L-BFGS-B optimisation is used to constrain the bandwidths to be greater than the binwidths: if the bandwidth is smaller than the binwidth it's impossible to compute the rmse because no smoothing occurs. The tolerance is set relatively high for numerical optimisation since the precise choice of bandwidth makes little difference visually, and we're unlikely to have sufficient data to make a statistically significant choice anyway.

a single numeric value representing the bandwidth that minimises the leave-one-out estimate of rmse. Vector has attributes evaluations giving the number of times the objective function was evaluated. If the optimisation does not converge, or smoothing is not needed (i.e. the estimate is on the lower bounds), a warning is thrown.

Other bandwidth estimation functions: h_grid; rmse_cv, rmse_cvs

x <- rchallenge(1e4)
xsum <- condense(bin(x, 1 / 10))
h <- best_h(xsum, control = list(trace = 3, REPORT = 1))

if (require("ggplot2")) {
autoplot(xsum)
autoplot(smooth(xsum, h))
}