Multivariate and Mixture Distributions

knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
library(algebraic.dist)

This vignette covers the multivariate normal (MVN), affine transformations, and mixture distribution facilities in algebraic.dist. These features were introduced in version 0.3.0 and provide closed-form operations wherever possible, falling back to Monte Carlo only when necessary.

MVN Basics

Construct an MVN with mvn(). The mu argument is the mean vector and sigma is the variance-covariance matrix (defaulting to the identity).

M <- mvn(mu = c(1, 2, 3),
         sigma = matrix(c(1.0, 0.5, 0.2,
                          0.5, 2.0, 0.3,
                          0.2, 0.3, 1.5), nrow = 3, byrow = TRUE))
M

Standard accessors return the moments and dimensionality.

mean(M)
vcov(M)
dim(M)

The params() generic flattens all parameters into a single named vector (useful for optimisation interfaces).

params(M)

Sampling

sampler() returns a reusable sampling function. The output is an n-by-d matrix where rows are observations and columns are dimensions.

set.seed(42)
samp <- sampler(M)
X <- samp(100)
dim(X)
head(X, 5)

Marginals

Extracting a marginal over a subset of indices returns a new MVN (or a univariate normal when a single index is selected).

M13 <- marginal(M, c(1, 3))
M13
mean(M13)
vcov(M13)

Selecting a single component yields a univariate normal.

M1 <- marginal(M, 1)
M1
mean(M1)
vcov(M1)

Closed-Form MVN Conditioning

Conditioning a multivariate normal on observed values uses the Schur complement formula. Given an MVN partitioned into free variables $X_1$ and observed variables $X_2 = a$, the conditional distribution is:

$$X_1 \mid X_2 = a \;\sim\; \mathcal{N}!\bigl(\mu_1 + \Sigma_{12}\Sigma_{22}^{-1}(a - \mu_2),\;\; \Sigma_{11} - \Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}\bigr)$$

This is exact -- no Monte Carlo is involved.

M2 <- mvn(mu = c(0, 0), sigma = matrix(c(1, 0.8, 0.8, 1), 2, 2))

# Condition on X2 = 1
M_cond <- conditional(M2, given_indices = 2, given_values = 1)
mean(M_cond)
vcov(M_cond)

With $\rho = 0.8$, the conditional mean is $\mu_1 + \rho(1 - \mu_2) = 0 + 0.8(1 - 0) = 0.8$, and the conditional variance is $1 - \rho^2 = 1 - 0.64 = 0.36$.

Predicate-based MC fallback

When the conditioning event is not "observe variable $j$ at value $a$" but an arbitrary predicate, the package falls back to Monte Carlo. This is accessed through the P argument.

set.seed(42)
M_cond_mc <- conditional(M2, P = function(x) abs(x[2] - 1) < 0.1)
mean(M_cond_mc)

The MC result is approximate and depends on the tolerance in the predicate. Prefer the closed-form interface whenever possible.

Affine Transformations

affine_transform(x, A, b) computes the distribution of $Y = AX + b$ where $X \sim \text{MVN}(\mu, \Sigma)$. The result is:

$$Y \sim \text{MVN}(A\mu + b,\;\; A \Sigma A^\top)$$

This is a closed-form operation that returns a new MVN (or normal if the result is one-dimensional).

Portfolio example

Consider two assets with expected returns and a covariance structure. A portfolio is a linear combination of the returns.

returns <- mvn(
  mu = c(0.05, 0.08),
  sigma = matrix(c(0.04, 0.01,
                   0.01, 0.09), 2, 2)
)

# 60/40 portfolio weights
w <- matrix(c(0.6, 0.4), nrow = 1)
portfolio <- affine_transform(returns, A = w)
mean(portfolio)
vcov(portfolio)

The portfolio return is $0.6 \times 0.05 + 0.4 \times 0.08 = 0.062$. The portfolio variance is $w \Sigma w^\top$, a scalar that captures the diversification benefit.

Dimensionality reduction

Affine transforms can also project to lower dimensions. For instance, computing the sum and difference of two variables.

X <- mvn(mu = c(3, 7), sigma = matrix(c(4, 1, 1, 9), 2, 2))

# A maps (X1, X2) -> (X1 + X2, X1 - X2)
A <- matrix(c(1, 1,
              1, -1), nrow = 2, byrow = TRUE)
Y <- affine_transform(X, A = A)
mean(Y)
vcov(Y)

Mixture Distribution Basics

A mixture distribution is constructed from a list of component distributions and a vector of non-negative weights that sum to one.

m <- mixture(
  components = list(normal(-2, 1), normal(3, 0.5)),
  weights = c(0.4, 0.6)
)
m

Mean and variance

The mixture mean is the weighted average of component means: $E[X] = \sum_k w_k \mu_k$.

mean(m)

The mixture variance uses the law of total variance: $\text{Var}(X) = \underbrace{\sum_k w_k \sigma_k^2}{\text{within-component}} + \underbrace{\sum_k w_k(\mu_k - \bar\mu)^2}{\text{between-component}}$.

vcov(m)

The between-component term captures the spread of the component means around the overall mean. This is why mixtures can have much larger variance than any individual component.

Density and CDF

density() and cdf() return callable functions, following the same pattern as other distributions in the package.

f <- density(m)

x_vals <- seq(-6, 8, length.out = 200)
y_vals <- sapply(x_vals, f)
plot(x_vals, y_vals, type = "l", lwd = 2,
     main = "Bimodal mixture density",
     xlab = "x", ylab = "f(x)")

The bimodal shape is clearly visible, with peaks near $-2$ and $3$.

F <- cdf(m)
F_vals <- sapply(x_vals, F)
plot(x_vals, F_vals, type = "l", lwd = 2,
     main = "Mixture CDF",
     xlab = "x", ylab = "F(x)")

Sampling

set.seed(123)
s <- sampler(m)
samples <- s(2000)
hist(samples, breaks = 50, probability = TRUE, col = "lightblue",
     main = "Mixture samples with true density",
     xlab = "x")
lines(x_vals, y_vals, col = "red", lwd = 2)

Mixture Operations

Marginals of mixtures

The marginal of a mixture is a mixture of marginals with the same weights. This follows directly from the definition: $p(x_I) = \sum_k w_k\, p_k(x_I)$.

m2 <- mixture(
  list(mvn(c(0, 0), diag(2)), mvn(c(3, 3), diag(2))),
  c(0.5, 0.5)
)

m2_x1 <- marginal(m2, 1)
m2_x1
mean(m2_x1)

The marginal mean is $0.5 \times 0 + 0.5 \times 3 = 1.5$, the weighted average of the component marginal means.

Conditional of mixtures (Bayesian weight update)

Conditioning a mixture on observed values updates the component weights via Bayes' rule. Each component's weight is scaled by its marginal density at the observed value:

$$w_k' \propto w_k \, f_k(x_{\text{obs}})$$

Components whose marginal density is higher at the observed value receive more weight.

mc <- conditional(m2, given_indices = 2, given_values = 0)
mc
mean(mc)

Conditioning on $X_2 = 0$ shifts weight toward the first component (whose mean for $X_2$ is 0) and away from the second component (whose mean for $X_2$ is 3). The conditional mean of $X_1$ is therefore pulled closer to 0 than the unconditional value of 1.5.

We can also condition at a value near the second component to see the opposite effect.

mc_high <- conditional(m2, given_indices = 2, given_values = 3)
mc_high
mean(mc_high)

Now the weight shifts toward the second component, and the conditional mean of $X_1$ is pulled toward 3.

Practical Example: Gaussian Mixture Classification

Gaussian mixture models (GMMs) are widely used for soft classification. The idea: each component represents a class, and conditioning on observed features updates the class probabilities.

Setup: a two-class GMM

Consider two clusters in 2D, representing two classes with equal prior probability.

class_a <- mvn(c(-2, -1), matrix(c(1.0, 0.3, 0.3, 0.8), 2, 2))
class_b <- mvn(c(2, 1.5), matrix(c(0.6, -0.2, -0.2, 1.2), 2, 2))

gmm <- mixture(
  components = list(class_a, class_b),
  weights = c(0.5, 0.5)
)
gmm

Visualise the clusters

set.seed(314)
samp_a <- sampler(class_a)(300)
samp_b <- sampler(class_b)(300)

plot(samp_a[, 1], samp_a[, 2], col = "steelblue", pch = 16, cex = 0.6,
     xlim = c(-6, 6), ylim = c(-5, 5),
     xlab = expression(X[1]), ylab = expression(X[2]),
     main = "Two-class GMM samples")
points(samp_b[, 1], samp_b[, 2], col = "tomato", pch = 16, cex = 0.6)
legend("topright", legend = c("Class A", "Class B"),
       col = c("steelblue", "tomato"), pch = 16)

Soft classification by conditioning

Suppose we observe $X_1 = -1$. Conditioning updates the class weights based on how likely each component is to produce that value.

post_neg <- conditional(gmm, given_indices = 1, given_values = -1)
post_neg

The updated weights give the posterior class probabilities. Since $X_1 = -1$ is much more likely under Class A (centred at $-2$) than Class B (centred at $2$), Class A receives most of the weight.

The conditional distribution of $X_2$ given $X_1 = -1$ is itself a mixture of the two conditional normals.

f_post <- density(post_neg)
x2_grid <- seq(-5, 5, length.out = 200)
y_post <- sapply(x2_grid, f_post)
plot(x2_grid, y_post, type = "l", lwd = 2,
     xlab = expression(X[2]),
     ylab = expression(f(X[2] ~ "|" ~ X[1] == -1)),
     main = "Posterior density of X2 given X1 = -1")

Now observe $X_1 = 2$, which sits near the centre of Class B.

post_pos <- conditional(gmm, given_indices = 1, given_values = 2)
post_pos

As expected, the weight shifts heavily toward Class B. The conditional density of $X_2$ now concentrates around the Class B conditional mean.

f_post_pos <- density(post_pos)
y_post_pos <- sapply(x2_grid, f_post_pos)
plot(x2_grid, y_post_pos, type = "l", lwd = 2,
     xlab = expression(X[2]),
     ylab = expression(f(X[2] ~ "|" ~ X[1] == 2)),
     main = "Posterior density of X2 given X1 = 2")

Ambiguous observations

When the observation falls between the two cluster means, both classes retain meaningful probability and the posterior becomes bimodal.

post_mid <- conditional(gmm, given_indices = 1, given_values = 0)
post_mid
f_post_mid <- density(post_mid)
y_post_mid <- sapply(x2_grid, f_post_mid)
plot(x2_grid, y_post_mid, type = "l", lwd = 2,
     xlab = expression(X[2]),
     ylab = expression(f(X[2] ~ "|" ~ X[1] == 0)),
     main = "Posterior density of X2 given X1 = 0 (ambiguous)")

At $X_1 = 0$ the posterior for $X_2$ is bimodal -- one mode near the Class A conditional mean, one near Class B. Although $X_1 = 0$ is equidistant from the two class means ($-2$ and $2$), Class A receives roughly 75% of the posterior weight because its broader $X_1$ variance ($\sigma^2 = 1$ vs $0.6$) gives it higher density at the midpoint. Class B still retains about 25% of the weight, so neither class is overwhelmingly dominant. This is the hallmark of soft classification: rather than forcing a hard decision, the mixture posterior represents genuine uncertainty about the class label.



Try the algebraic.dist package in your browser

Any scripts or data that you put into this service are public.

algebraic.dist documentation built on Feb. 27, 2026, 5:06 p.m.