pco_method: Penalized Comparison to Overfitting
In hericks/KDE: Kernel Density Estimation

Description Usage Arguments Details Value Source See Also

The PCO method is used to estimate an optimal bandwidth for kernel density estimation from a given set of bandwidths.

pco_method(
  kernel,
  samples,
  bandwidths = logarithmic_bandwidth_set(1/length(samples), 1, 10),
  lambda = 1,
  subdivisions = 100L
)

`kernel`	S3 object of class `Kernel`; the kernel to use for the estimator
`samples`	numeric vector; the observations.
`bandwidths`	strictly positive numeric vector; the bandwidth set from which the bandwidth with the least estimated risk will be selected.
`lambda`	positive numeric scalar; a tuning parameter.
`subdivisions`	positive numeric scalar; subdivisions parameter internally passed to `integrate_primitive`.

The PCO method aims to minimize an upper bound for the mean integrated squared error (MISE) of a kernel density estimator. The MISE is defined as the expectation of the squared L2-Norm of the difference between estimator and (unknown) true density.

pco_method internally uses a criterion function to calculate the PCO criterion value, approximating the risk. Subsequently the bandwidth with the minimal criterion value is selected.

The popular bias-/variance-decomposition is used. The bias term still depends on the unknown density. Thus, a comparison of the estimator with an associated bandwidth to the overfitting one, namely the estimator with the smallest bandwidth, is used to estimate the bias term itself.

Further a penalty term is computed as the sum of two variances, particularly the variance of the risk decomposition and the variance of the bias term estimation. During the calculation the tuning parameter lambda is used. The recommended value for lambda is 1.

The PCO criterion is given by the sum of the comparison to overfitting and the penalty term, thus the procedure tries to find a balance between those terms. Therefore, it is comprehensible why this method is called penalized comparison to overfitting.

For more information see the linked papers below.

The estimated optimal bandwidth contained in the bandwidth set.

Estimator selection: a new method with applications to KDEs, Lacour [2017]

Numerical performance of PCO for multivariate KDEs, Varet [2019]

kernel_density_estimator for more information about kernel density estimators, cross_validation and goldenshluger_lepski for more automatic bandwidth-selection algorithms.

hericks/KDE documentation built on Aug. 22, 2020, 12:04 a.m.