knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
library(algebraic.dist)
The algebraic.dist package provides an algebra over probability
distributions. You can combine distribution objects with standard
arithmetic operators (+, -, *, /, ^) and mathematical functions
(exp, log, sqrt, ...), and the package will either simplify the
result to a known closed-form distribution or fall back to Monte Carlo
sampling when no closed form exists.
This means you can write expressions like normal(0, 1) + normal(2, 3) and
get back a normal object, without manually deriving the result. When the
package cannot simplify, the result is an edist (expression distribution)
that supports all the usual methods -- mean, vcov, sampler, cdf,
and more -- via MC approximation.
Arithmetic operators on dist objects create edist (expression
distribution) objects, which are then automatically simplified when a
closed-form rule applies. When no rule matches, the edist is kept as-is.
x <- normal(0, 1) y <- exponential(1) z <- x + y
Since the sum of a normal and an exponential has no simple closed form,
z remains an edist:
is_edist(z) class(z)
The edist still supports all distribution methods. Moments are computed
via Monte Carlo:
set.seed(1) mean(z) vcov(z)
The theoretical mean is 0 + 1 = 1 and variance is 1 + 1 = 2, so the
MC estimates should be close.
The real power of the package is automatic simplification. When you compose distributions using operations that have known closed-form results, the package returns the simplified distribution directly -- no sampling needed.
The sum and difference of independent normals is normal. Scalar multiplication and location shifts are also handled.
Sum of normals:
z <- normal(1, 4) + normal(2, 5) is_normal(z) z mean(z) vcov(z)
The result is Normal(mu = 3, var = 9), since means add and variances add for independent normals.
Difference of normals:
z <- normal(1, 4) - normal(2, 5) z mean(z) vcov(z)
Normal(mu = -1, var = 9) -- means subtract, but variances still add.
Scalar multiplication:
z <- 3 * normal(2, 1) z mean(z) vcov(z)
If X ~ Normal(mu, v), then c * X ~ Normal(c * mu, c^2 * v).
Location shift:
z <- normal(5, 2) + 10 z mean(z) vcov(z)
Adding a constant shifts the mean but preserves the variance.
Sums of independent exponentials (with the same rate) produce gamma distributions. Gamma distributions with the same rate also combine.
Sum of exponentials:
z <- exponential(2) + exponential(2) is_gamma_dist(z) z params(z)
Two independent Exp(2) variables sum to Gamma(shape = 2, rate = 2).
Gamma plus exponential:
z <- gamma_dist(3, 2) + exponential(2) z params(z)
Gamma(3, 2) + Exp(2) = Gamma(4, 2).
Sum of gammas (same rate):
z <- gamma_dist(3, 2) + gamma_dist(4, 2) z params(z)
Gamma(3, 2) + Gamma(4, 2) = Gamma(7, 2) -- shapes add when rates match.
Certain mathematical transformations of distributions have known closed-form results.
Exponential of a normal is log-normal:
z <- exp(normal(0, 1)) is_lognormal(z) z
Logarithm of a log-normal is normal:
z <- log(lognormal(0, 1)) is_normal(z) z
Standard normal squared is chi-squared:
z <- normal(0, 1)^2 is_chi_squared(z) z
This rule specifically applies when the normal has mean 0 and variance 1.
Chi-squared addition:
z <- chi_squared(3) + chi_squared(5) is_chi_squared(z) z params(z)
Degrees of freedom add: Chi-squared(3) + Chi-squared(5) = Chi-squared(8).
Poisson addition:
z <- poisson_dist(2) + poisson_dist(3) is_poisson_dist(z) z params(z)
Rates add: Poisson(2) + Poisson(3) = Poisson(5).
Product of log-normals:
z <- lognormal(0, 1) * lognormal(1, 2) is_lognormal(z) z params(z)
The product of independent log-normals is log-normal because
log(X * Y) = log(X) + log(Y), and the sum of independent normals is
normal. So the meanlog values add and the sdlog values combine as
sqrt(sdlog1^2 + sdlog2^2).
Uniform scaling and shifting:
z <- 2 * uniform_dist(0, 1) is_uniform_dist(z) z params(z)
Scaling a Uniform(0, 1) by 2 gives Uniform(0, 2).
z <- uniform_dist(0, 1) + 5 is_uniform_dist(z) z params(z)
Shifting Uniform(0, 1) by 5 gives Uniform(5, 6).
When no algebraic rule matches, the result remains an unsimplified edist.
All distribution methods still work -- they just use Monte Carlo under the
hood.
z <- normal(0, 1) * exponential(1) is_edist(z)
set.seed(2) mean(z)
The true mean of X * Y where X ~ Normal(0, 1) and Y ~ Exp(1) is
E[X] * E[Y] = 0 * 1 = 0 (by independence), so the MC estimate should
be near zero.
You can explicitly convert any distribution to an empirical distribution
via realize(). This draws n samples and wraps them in an
empirical_dist that supports exact (discrete) methods for CDF, density,
conditional, and more.
set.seed(3) z <- normal(0, 1) * exponential(1) rd <- realize(z, n = 10000) rd mean(rd) vcov(rd)
For edist objects, methods like cdf(), density(), and inv_cdf()
automatically materialize samples behind the scenes (with caching, so
repeated calls share the same sample). You do not need to call realize()
manually for these:
set.seed(4) z <- normal(0, 1) + exponential(1) Fz <- cdf(z) Fz(1) Fz(2)
The Summary group generic is implemented for distributions. The sum()
and prod() generics reduce via + and *, so they benefit from all the
simplification rules. The min() and max() generics create elementwise
edist expressions.
Min of exponentials has a special rule -- the minimum of independent exponentials is exponential with the sum of rates:
z <- min(exponential(1), exponential(2)) is_exponential(z) z params(z)
This is the well-known result: if X ~ Exp(r1) and Y ~ Exp(r2), then
min(X, Y) ~ Exp(r1 + r2).
Sum reduces via the + operator, so simplification rules apply:
z <- sum(normal(0, 1), normal(2, 3), normal(-1, 2)) is_normal(z) z mean(z) vcov(z)
Three normals sum to a single normal: mean = 0 + 2 - 1 = 1, var = 1 + 3 + 2 = 6.
Max and other combinations that lack closed forms remain as edist:
set.seed(5) z <- max(exponential(1), exponential(2)) is_edist(z) mean(z)
The package provides functions for common asymptotic results from probability theory.
clt(base_dist) returns the limiting distribution of the standardized
sample mean, sqrt(n) * (Xbar_n - mu). For a univariate distribution
with variance sigma^2, this is Normal(0, sigma^2).
z <- clt(exponential(1)) z mean(z) vcov(z)
For Exp(1), the mean is 1 and the variance is 1, so the CLT limit is Normal(0, 1).
lln(base_dist) returns the degenerate distribution at the population
mean -- the limit of Xbar_n as n -> infinity. It is represented as a
normal with zero variance.
d <- lln(exponential(1)) d mean(d) vcov(d)
delta_clt(base_dist, g, dg) returns the limiting distribution of
sqrt(n) * (g(Xbar_n) - g(mu)) under the delta method. You supply the
function g and its derivative dg.
z <- delta_clt(exponential(1), g = exp, dg = exp) z mean(z) vcov(z)
For g = exp applied to Exp(1) (mean = 1, var = 1), the delta method
gives Normal(0, (exp(1))^2 * 1) = Normal(0, e^2).
normal_approx(x) constructs a normal distribution that matches the mean
and variance of any input distribution:
g <- gamma_dist(5, 2) n <- normal_approx(g) n mean(n) vcov(n)
The Gamma(5, 2) has mean 2.5 and variance 1.25, so the normal approximation is Normal(2.5, 1.25).
To see the CLT in action, we can draw replicated standardized sample means at increasing sample sizes and watch them converge to the predicted distribution:
set.seed(42) base <- exponential(rate = 1) ns <- c(10, 100, 1000) for (n in ns) { samp_means <- replicate(5000, { x <- sampler(base)(n) sqrt(n) * (mean(x) - mean(base)) }) cat(sprintf("n=%4d: mean=%.3f, var=%.3f\n", n, mean(samp_means), var(samp_means))) }
Compare with the CLT prediction:
z <- clt(base) cat(sprintf("CLT: mean=%.3f, var=%.3f\n", mean(z), vcov(z)))
As n grows, the empirical mean converges to 0 and the empirical variance
converges to 1, matching the CLT limit of Normal(0, 1).
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.