smData: Smooth discrete data In essHist: The Essential Histogram

Description

Resolve equal values (a.k.a., tied values) of observations by randomizing locally according to a Gaussian distribution with a rather small variance.

Usage

 `1` ```smData(x, sw = NA) ```

Arguments

 `x` a numeric vector containing the data. `sw` a positive number; it specifies the spread width of the randomization procedure; its default value is from the minimal gap between two different values of observations.

Details

The essential histogram (Li et al, 2016) is designed based on the assumption that the underlying distribution function is continuous. Such assumption is natural as it guarantees the existence of density with respect to the Lebesgue measure. However, in pratice, one also faces discrete distributions, whose distribution function is piece-wise constant, thus discontinuous. The function `smData` implements a simple idea of adapting the essential histogram to discrete data: more precisely, the Dirac delta density is approximated by a thin Guassian density, and the resulted approximation has continuous distribution.

The function `smData` is automatically called, when `essHistogram` is called. Note that `smData` only sorts the observations `x` if there is no tied values.

Value

A vector of length `length(x)` is returned, i.e., modified observations with no tied values, and ordered increasingly.

References

Li, H., Munk, A., Sieling, H., and Walther, G. (2016). The essential histogram. arXiv:1612.07216.

`essHistogram`
 ``` 1 2 3 4 5 6 7 8 9 10 11``` ```# generate Poisson data (discrete) set.seed(123) n = 100 # number of observations lambda = 5 x.dis = rpois(n, lambda) # smooth discrete data x.sm = smData(x.dis) # compute the essential histogram eh = essHistogram(x.sm, xname = "Poisson") ```