Resolve equal values (a.k.a., tied values) of observations by randomizing locally according to a Gaussian distribution with a rather small variance.
a numeric vector containing the data.
a positive number; it specifies the spread width of the randomization procedure; its default value is from the minimal gap between two different values of observations.
The essential histogram (Li et al, 2016) is designed based on the assumption that the underlying distribution function is continuous. Such assumption is natural as it guarantees the existence of density with respect to the Lebesgue measure. However, in pratice, one also faces discrete distributions, whose distribution function is piece-wise constant, thus discontinuous. The function
smData implements a simple idea of adapting the essential histogram to discrete data: more precisely, the Dirac delta density is approximated by a thin Guassian density, and the resulted approximation has continuous distribution.
smData is automatically called, when
essHistogram is called. Note that
smData only sorts the observations
x if there is no tied values.
A vector of length
length(x) is returned, i.e., modified observations with no tied values, and ordered increasingly.
Li, H., Munk, A., Sieling, H., and Walther, G. (2016). The essential histogram. arXiv:1612.07216.
1 2 3 4 5 6 7 8 9 10 11
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.