smData: Smooth discrete data

Description Usage Arguments Details Value References See Also Examples

View source: R/smoothData.R

Description

Resolve equal values (a.k.a., tied values) of observations by randomizing locally according to a Gaussian distribution with a rather small variance.

Usage

1
smData(x, sw = NA)

Arguments

x

a numeric vector containing the data.

sw

a positive number; it specifies the spread width of the randomization procedure; its default value is from the minimal gap between two different values of observations.

Details

The essential histogram (Li et al, 2016) is designed based on the assumption that the underlying distribution function is continuous. Such assumption is natural as it guarantees the existence of density with respect to the Lebesgue measure. However, in pratice, one also faces discrete distributions, whose distribution function is piece-wise constant, thus discontinuous. The function smData implements a simple idea of adapting the essential histogram to discrete data: more precisely, the Dirac delta density is approximated by a thin Guassian density, and the resulted approximation has continuous distribution.

The function smData is automatically called, when essHistogram is called. Note that smData only sorts the observations x if there is no tied values.

Value

A vector of length length(x) is returned, i.e., modified observations with no tied values, and ordered increasingly.

References

Li, H., Munk, A., Sieling, H., and Walther, G. (2016). The essential histogram. arXiv:1612.07216.

See Also

essHistogram

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# generate Poisson data (discrete)
set.seed(123)
n      = 100 # number of observations
lambda = 5
x.dis  = rpois(n, lambda)

# smooth discrete data
x.sm   = smData(x.dis)

# compute the essential histogram
eh = essHistogram(x.sm, xname = "Poisson")

essHist documentation built on April 9, 2018, 5:04 p.m.