View source: R/discretization.R
discretize_density | R Documentation |
Transforms the density function of a continuous random variable into a discrete probability distribution with minimal distortion using the Lloyd-Max algorithm.
discretize_density(density_fn, n_levels, eps = 1e-06)
density_fn |
probability density function. |
n_levels |
cardinality of the set of all possible outcomes. |
eps |
convergence threshold for the algorithm. |
The function addresses the problem of transforming a continuous random
variable X
into a discrete random variable Y
with minimal
distortion. Distortion is measured as mean-squared error (MSE):
\text{E}\left[ (X - Y)^2 \right] =
\sum_{k=1}^{K} \int_{x_{k-1}}^{x_{k}} f_{X}(x)
\left( x - r_{k} \right)^2 \, dx
where:
f_{X}
is the probability density function of X
,
K
is the number of possible outcomes of Y
,
x_{k}
are endpoints of intervals that partition the domain
of X
,
r_{k}
are representation points of the intervals.
This problem is solved using the following iterative procedure:
1.
Start with an arbitrary initial set of representation
points: r_{1} < r_{2} < \dots < r_{K}
.
2.
Repeat the following steps until the improvement in MSE
falls below given \varepsilon
.
3.
Calculate endpoints as x_{k} = (r_{k+1} + r_{k})/2
for each k = 1, \dots, K-1
and set x_{0}
and x_{K}
to
-\infty
and \infty
, respectively.
4.
Update representation points by setting r_{k}
equal to the conditional mean of X
given X \in (x_{k-1}, x_{k})
for each k = 1, \dots, K
.
With each execution of step (3)
and step (4)
, the MSE decreases
or remains the same. As MSE is nonnegative, it approaches a limit.
The algorithm terminates when the improvement in MSE is less than a given
\varepsilon > 0
, ensuring convergence after a finite number
of iterations.
This procedure is known as Lloyd-Max's algorithm, initially used for scalar quantization and closely related to the k-means algorithm. Local convergence has been proven for log-concave density functions by Kieffer. Many common probability distributions are log-concave including the normal and skew normal distribution, as shown by Azzalini.
A list containing:
discrete probability distribution.
endpoints of intervals that partition the continuous domain.
representation points of the intervals.
distortion measured as the mean-squared error (MSE).
Azzalini, A. (1985). A class of distributions which includes the normal ones. Scandinavian Journal of Statistics 12(2), 171–178.
Kieffer, J. (1983). Uniqueness of locally optimal quantizer for log-concave density and convex error function. IEEE Transactions on Information Theory 29, 42–47.
Lloyd, S. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory 28 (2), 129–137.
discretize_density(density_fn = stats::dnorm, n_levels = 5)
discretize_density(density_fn = function(x) {
2 * stats::dnorm(x) * stats::pnorm(0.5 * x)
}, n_levels = 4)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.