pco_method: Penalized Comparison to Overfitting

Description Usage Arguments Details Value Source See Also

View source: R/pco_method.R

Description

The PCO method is used to estimate an optimal bandwidth for kernel density estimation from a given set of bandwidths.

Usage

1
2
3
4
5
6
7
pco_method(
  kernel,
  samples,
  bandwidths = logarithmic_bandwidth_set(1/length(samples), 1, 10),
  lambda = 1,
  subdivisions = 100L
)

Arguments

kernel

S3 object of class Kernel; the kernel to use for the estimator

samples

numeric vector; the observations.

bandwidths

strictly positive numeric vector; the bandwidth set from which the bandwidth with the least estimated risk will be selected.

lambda

positive numeric scalar; a tuning parameter.

subdivisions

positive numeric scalar; subdivisions parameter internally passed to integrate_primitive.

Details

The PCO method aims to minimize an upper bound for the mean integrated squared error (MISE) of a kernel density estimator. The MISE is defined as the expectation of the squared L2-Norm of the difference between estimator and (unknown) true density.

pco_method internally uses a criterion function to calculate the PCO criterion value, approximating the risk. Subsequently the bandwidth with the minimal criterion value is selected.

The popular bias-/variance-decomposition is used. The bias term still depends on the unknown density. Thus, a comparison of the estimator with an associated bandwidth to the overfitting one, namely the estimator with the smallest bandwidth, is used to estimate the bias term itself.

Further a penalty term is computed as the sum of two variances, particularly the variance of the risk decomposition and the variance of the bias term estimation. During the calculation the tuning parameter lambda is used. The recommended value for lambda is 1.

The PCO criterion is given by the sum of the comparison to overfitting and the penalty term, thus the procedure tries to find a balance between those terms. Therefore, it is comprehensible why this method is called penalized comparison to overfitting.

For more information see the linked papers below.

Value

The estimated optimal bandwidth contained in the bandwidth set.

Source

Estimator selection: a new method with applications to KDEs, Lacour [2017]

Numerical performance of PCO for multivariate KDEs, Varet [2019]

See Also

kernel_density_estimator for more information about kernel density estimators, cross_validation and goldenshluger_lepski for more automatic bandwidth-selection algorithms.


hericks/KDE documentation built on Aug. 22, 2020, 12:04 a.m.