Disake-package: Discrete associated kernel estimators

Description Details Author(s) References


Discrete smoothing of probability mass function (p.m.f.) is performed using three discrete associated kernels: DiracDU, Binomial and Discrete Triangular. Two automatic bandwidth selection procedures are implemented: the cross-validation method for the three kernels and the local Bayesian approach for Binomial kernel. Note that DiracDU is used for categorical data, Binomial kernel is appropriate for count data with small or moderate sample sizes, and Discrete Triangular kernel is recommanded for count data with large sample sizes.


The estimated p.m.f.:

The kernel estimator \widehat{f}_n of f is defined as

\widehat{f}_n(x) = \frac{1}{n}∑_{i=1}^{n}{K_{x,h}(X_i)},

where K_{x,h} is one of the kernels defined below. In practice, we first calculate the normalizing constant

{C}_n = ∑_{x\in N}{\widehat{f}_n(x)},

which is not generaly equal to 1. This constant {C}_n is 1 only for Dirac and DiracDU. The estimated p.m.f. is then \tilde{f}_n=\widehat{f}_n/C_n.

The integrated squared error (ISE) defined by

{ISE}_0 = ∑_{x\in N}{{\{\tilde{f}_n(x)} - f_0(x)\}^2}

is the criteria used to measure the smoothness of the kernel estimator \tilde{f}_n with the empirical p.m.f. f_0. See Kokonendji and Senga Kiessé (2011).

Given a data sample, the Disake package allows to compute the p.m.f. using one of the three kernel functions: DiracDU, Binomial and Discrete Triangular. The bandwidth parameter is calculated using the cross-validation technique CVbw. When the kernel function is Binomial, the bandwidth parameter is also computed using the local Bayesian procedure Baysbw. The kernel functions kf are defined below.

Binomial kernel :

Let x\in N:= \{0, 1, … \} and {S}_x = \{0, 1, …, x + 1\}. The Binomial kernel is defined on the support {S}_x by

B_{x,h}(y) = \frac {(x+1)!} {y!(x+1-y)!}≤ft(\frac{x+h}{x+1}\right)^y≤ft(\frac{1-h}{x+1}\right)^{(x+1-y)}1_{S_{x}}(y),

where h\in(0, 1] and 1[A] denotes the indicator function of A. Note that B_[x,h] is the p.m.f. of the Binomial distribution with its number of trials x+1 and its success probability (x+h)/(x+1). See Kokonendji and Senga Kiessé (2011).

DiracDU kernel :

For fixed number of categories c\in \{2,3,...\} , we define {S}_{c} = \{0, 1, …, c-1\}. The DiracDU kernel is defined on {S}_{c} by

DU_{x,h;c}(y) = (1 - h)1_{\{x\}}(y)+\frac {h} {c-1}1_{S_{c}\setminus\{x\}}(y),

where x\in {S}_{c} and h\in(0, 1]. See Kokonendji and Senga Kiessé (2011), and also Aitchison and Aitken (1976) for multivariate case.

Discrete Triangular kernel:

For fixed arm a\in N, we define {S}_{x,a} = \{x-a,…, x, …, x + a\}. The Discrete Triangular kernel is defined on {S}_{x,a} by

DT_{x,h;a}(y) = \frac {(a+1)^h - |y-x|^h} {P(a,h)}1_{S_{x,a}}(y),

where x\in N, h>0 and P(a,h)=(2a+1)(a+1)^h - 2(1+2^h+ \cdots +a^h) is the normalizing constant. For a=0, the Discrete Triangular kernel DT_[x,h;0] corresponds to the Dirac kernel on x; see Kokonendji et al. (2007), and also Kokonendji and Zocchi (2010) for an asymmetric version of Discrete Triangular.

The bandwidth selection:

Two functions are implemented to select the bandwidth: cross-validation and local Bayesian procedure. The cross-validation technique CVbw is used for DiracDU, Binomial and Discrete Triangular kernels; see Kokonendji and Senga Kiessé (2011). The local Bayesian procedure Baysbw is implemented to select the bandwidth for Binomial kernel; see Zougab et al. (2012).


W. E. Wansouwé , C. C. Kokonendji and D. T. Kolyang

Maintainer: W. E. Wansouwé <[email protected]>


Aitchison, J. and Aitken, C.G.G. (1976). Multivariate binary discrimination by the kernel method, Biometrika 63, 413 - 420.

Kokonendji, C.C. and Senga Kiessé, T. (2011). Discrete associated kernel method and extensions, Statistical Methodology 8, 497 - 516.

Kokonendji, C.C., Senga Kiessé, T. and Zocchi, S.S. (2007). Discrete triangular distributions and non-parametric estimation for probability mass function, Journal of Nonparametric Statistics 19, 241 - 254.

Kokonendji, C.C. and Zocchi, S.S. (2010). Extensions of discrete triangular distribution and boundary bias in kernel estimation for discrete functions, Statistics and Probability Letters 80, 1655 - 1662.

Zougab, N., Adjabi, S. and Kokonendji, C.C. (2012). Binomial kernel and Bayes local bandwidth in discrete functions estimation, Journal of Nonparametric Statistics 24, 783 - 795.

Disake documentation built on May 29, 2017, 8:37 p.m.