This vignette follows the treatment in Chapter 3 of Le Roux and Rouanet's Geometric Data Analysis, with drastically simplified exposition but employing the same example data.

library(devtools)
devel = TRUE
if (devel) {
    load_all("~/Documents/edu/R/cloud")
} else {
    install_github("corybrunson/cloud")
    library(cloud)
}
library(knitr)
opts_chunk$set(echo=FALSE, results=FALSE, message=FALSE, warning=FALSE)

The central object of GDA is a point cloud in a Euclidean space, which encodes a set of statistical observations arranged in a table, or two-dimensional array (for instance, a contingency table in Correspondence Analysis or an Individuals--Variables table in Principal Components Analysis). By convention, the columns of the table correspond to the coordinate axes of the Euclidean space while the rows correspond to the points.

3.1 Basic Statistics

Rigorously, a point cloud in a Euclidean space $\mathcal{U}$ consists of a set $J$ of labels and a mapping $M^J:J\to\mathcal{U}$ that takes each label $j\in J$ to its corresponding point $M^j\in\mathcal{U}$. The points are assigned positive weights $\omega_j$, which may be assumed unitary when not specified. The most common weights will be absolute frequencies $n_j$, $n=\sum_{j\in J}n_j$, and their associated relative frequencies $f_j=\frac{n_j}{n}$.

As a running example, we load the Target example dataset, used throughout the chapter, whose coordinates are provided in Exercise 3.4:

data(Target)
print(Target)
print(class(Target))
T_freq <- rep(1 / nrow(Target), nrow(Target))
plot(xlim = c(-19, 19), ylim = c(-19, 19), axes = FALSE, asp = 1,
     Target, pch = 16,
     xlab = "", ylab = "")
text(Target, labels = rownames(Target), pos = 1)
rect(xleft = -19, xright = 19, ybottom = -19, ytop = 19, lty = 3)
legend(x = -19, y = 19, box.lty = 0, legend = "Target example")

The labels consist of $j_1$ through $j_{10}$. Because these data occupy two dimensions, i.e. the labels are mapped into $\mathcal{U}=\mathbb{R}^2$, they constitute a plane cloud.

3.1.1 Mean Point

The mean point of the cloud $M^J$ in $\mathcal{U}$ is meant to be a proxy point location for the entire cloud. Given an arbitrary point $P\in\mathcal{U}$, the mean point is defined as the point arrived to from $P$ via the sum of the weighted vectors $f_j\overrightarrow{PM^j}$ from $P$ to each point $M^j$---that is, as the (unique!) point $G\in\mathcal{U}$ for which $$\overrightarrow{PG}=\sum_{j\in J}f_j\overrightarrow{PM^j}$$ doesn't depend on the choice of point $P\in\mathcal{U}$. The mean point satisfies the Barycentric property that $$\sum_{j\in J}f_j\overrightarrow{GM^j}=\overrightarrow{0},$$ which can be checked by substituting $P=G$ in the definition.[^1]

G <- barycenter(Target)
print(G)
# Vectors from barycenter to each point
T_vectors <- Target - do.call(rbind,
                                   replicate(nrow(Target),
                                             G,
                                             simplify = FALSE))
# Plot barycenter and vectors therefrom to points in cloud
plot(xlim = c(-19, 19), ylim = c(-19, 19), axes = FALSE, asp = 1,
     x = c(), y = c(),
     xlab = "", ylab = "")
text(G, labels = "G", adj = c(1.5, 1.25))
for (i in 1:nrow(Target)) {
    text(Target[i, , drop = FALSE], labels = rownames(Target)[i],
         adj = .5 - (T_vectors[i, ] / sqrt(sum(T_vectors[i, ] ^ 2))))
}
arrows(x0 = G[, 1], y0 = G[, 2],
       x1 = Target[, 1], y1 = Target[, 2],
       length = .15, angle = 15)
rect(xleft = -19, xright = 19, ybottom = -19, ytop = 19, lty = 3)
legend(x = -19, y = 19, box.lty = 0,
       legend = "Target: barycentric characterization")

[^1]: These and other formulas are given in more general terms in GDA, in which the points $M^j$ are weighted by masses $\omega_j$. I may be forced to adopt this generality, but i haven't yet.

3.1.2 Inertia, Sum of Squares, Variance and Contributions

The inertia of a cloud $M^J$ with respect to a point $P$ is defined, as in other settings, as the sum-squared distance of each point $M^j$ from $P$: $${In}^PM^J=\sum_{j\in J}{In}j^P=\sum{j\in J}\omega_j(PM^j)^2.$$ The inertia for $\omega_j=n_j$ is the sum of squares and that for $\omega_j=f_j$ is the mean of squares. The variance of the cloud is its mean of squares from its barycenter, and can be interpreted as the sum of the absolute contributions of the individual points in the cloud: $${Var}M^J=\sum_{j\in J}{Cta}j=\sum{j\in J}f_j(GM^j)^2.$$ The *relative contributions of the points are, then, the fractions ${Ctr}_j=\frac{{Cta}_j}{{Var}M^J}$.

P <- c(0, 0)
P_in <- inertia(point = P, cloud = Target)
G_in <- inertia(point = G, cloud = Target)
T_var <- inertia(G, cloud = Target,
                           weights = T_freq)
print(c(center_inertia = P_in,
        barycenter_inertia = G_in,
        variance = T_var))
Ctas <- T_freq * apply(Target, 1, function(m) {
    sum((m - G) ^ 2)
})
Ctrs <- Ctas / T_var
print(cbind(Ctas, Ctrs))

The first Huyghen's theorem states that the inertia of a cloud $M^J$ with respect to a point $P$ can be decomposed into the sum of the variance of the cloud (its inertia with respect to its barycenter $G$) and the squared distance between $G$ and $P$: $$\sum_{j\in J}f_j(PM^j)^2=(PG)^2+\sum_{j\in J}f_j(GM^j)^2.$$ (See the illustration below.) A consequence is the metric characterization of the barycenter, which has that the barycenter minimizes the mean of squares of the cloud. That is, across $P\in\mathcal{U}$, the quantity $\sum_{j\in J}f_j(PM^j)^2$ is minimized when $P=G$.

huyghen_test(point = P, cloud = Target, weights = T_freq)
huyghen_test(point = G, cloud = Target, weights = T_freq)
# Three plots!
par(mfrow = c(1, 3))
# Plot points relative to center
plot(xlim = c(-19, 19), ylim = c(-19, 19), axes = FALSE, asp = 1,
     Target, pch = 16,
     xlab = "", ylab = "")
segments(x0 = 0, y0 = 0, x1 = Target[, 1], y1 = Target[, 2])
text(x = P[1], y = P[2], labels = "P",
     adj = c(sqrt(2), sqrt(2)))
rect(xleft = -19, xright = 19, ybottom = -19, ytop = 19, lty = 3)
# Plot center relative to barycenter
plot(xlim = c(-19, 19), ylim = c(-19, 19), axes = FALSE, asp = 1,
     x = c(), y = c(),
     xlab = "", ylab = "")
points(rbind(P, G),
       pch = c(13, 21), bg = c("white", "grey"), cex = 2)
segments(x0 = P[1], y0 = P[2],
         x1 = G[1], y1 = G[2])
text(x = rbind(P, G),
     labels = c("P", "G"), adj = c(1, -1))
rect(xleft = -19, xright = 19, ybottom = -19, ytop = 19, lty = 3)
# Plot points relative to barycenter
plot(xlim = c(-19, 19), ylim = c(-19, 19), axes = FALSE, asp = 1,
     Target, pch = 16,
     xlab = "", ylab = "")
segments(x0 = G[1], y0 = G[2],
         x1 = Target[, 1], y1 = Target[, 2])
text(x = G[1], y = G[2], labels = "G",
     adj = c(sqrt(3), 1))
rect(xleft = -19, xright = 19, ybottom = -19, ytop = 19, lty = 3)
# Reset
par(mfrow = c(1, 1))

3.2 Projected Clouds

The (orthogonal) projection of a point cloud $M^J$ in a space $\mathcal{U}$ onto a subspace $\mathcal{H}$ is the cloud $H^J={H^j}_{j\in J}$ consisting of its points' projections, and called the projected cloud. An important property of the projected cloud is that its barycenter $G'$ is the projection of the barycenter $G$ of $M^J$.

axis <- rbind(c(0, 0), c(2, 1))
print(affine_projection(cloud = Target, subspace = axis))
projection_test(cloud = Target, subspace = axis)

3.2.1 Variance in a Direction

The variances of the projected clouds of $M^J$ onto two equidimensional parallel subspaces $\mathcal{H}$ and $\mathcal{H}'$ are equal, and are called the variance in the direction of $\mathcal{H}$ (or of $\mathcal{H}'$), here denoted $\mathrm{Var}\mathcal{H}M^J$. A special case is the variance in the direction of a line, equal to $$\mathrm{Var}{\overrightarrow{\alpha}}M^J=\sum_{j\in J}f_j\frac{\langle\overrightarrow{GM}^j\mid\overrightarrow{\alpha}\rangle^2}{\|\overrightarrow{\alpha}\|^2}$$ and called the variance of axis.

axis_var <- direction_variance(cloud = Target, weights = T_freq,
                               subspace = axis)
print(axis_var)
print(direction_variance(cloud = Target, weights = T_freq,
                         subspace = affine_decomposition(axis)$linear_subspace,
                         type = "linear"))
# Three plots!
par(mfrow = c(1, 2))
# Plot cloud, axis, and residual segments
plot(xlim = c(-19, 19), ylim = c(-19, 19), axes = FALSE, asp = 1,
     Target, pch = 16,
     xlab = "", ylab = "")
points(x = P[1], y = P[2], pch = 13, cex = 2)
m <- diff(axis[, 2]) / diff(axis[, 1])
abline(a = axis[1, 2] - m * axis[1, 1], b = m, lty = 2)
rect(xleft = -19, xright = 19, ybottom = -19, ytop = 19, lty = 3)
# Plot center relative to barycenter
plot(xlim = c(-19, 19), ylim = c(-19, 19), axes = FALSE, asp = 1,
     x = c(), y = c(),
     xlab = "", ylab = "")
points(rbind(P, G),
       pch = c(13, 21), bg = c("white", "grey"), cex = 2)
segments(x0 = P[1], y0 = P[2],
         x1 = G[1], y1 = G[2])
text(x = rbind(P, G),
     labels = c("P", "G"), adj = c(1, -1))
rect(xleft = -19, xright = 19, ybottom = -19, ytop = 19, lty = 3)
# Plot cloud, new axis, and new residual segments

# Reset
par(mfrow = c(1, 1))

3.2.2 Residual Square Mean

The first Huyghen's theorem can be extended from "point" variance to directional variance by way of projection: Given a cloud $M^J$ and a subspace $\mathcal{H}$, the residual square mean $$\sum_{j\in J}f_j(M^jH^j)^2$$ of $M^J$ with respect to $\mathcal{H}$ is the weighted mean of the squares of the "residuals" $M^jH^j$---the distances from the points of $M^J$ to their projections in $H^J$. The general Huyghen's theorem then states that the residual square mean with respect to $\mathcal{H}$ can be decomposed into the sum of the residual square mean with respect to a same-dimensional parallel space $\mathcal{H}'$ through the barycenter $G$ of $M^J$ and the squared distance from $G$ to $\mathcal{H}$: $$\sum_{j\in J}f_j(M^jH^j)^2=\sum_{j\in J}f_j(M^jA^j)^2+(GG')^2,$$ where $G'$ is the projection of $G$ onto $\mathcal{H}$.

huyghen_general_test(cloud = Target, weights = T_freq, subspace = axis)

3.2.3 Fitted and Residual Clouds

When a projected cloud $A^J$ (in a subspace $\mathcal{A}$) is used to model its preimage $M^J$, it is called a fitted cloud, and the vectors $\overrightarrow{A^jM^j}$, whose lengths are the residuals mentioned above, are called residual deviations. This leads to the geometric data--model--error formulation $$\forall j\in J\ \ M^j=A^j+\overrightarrow{A^jM^j},$$ where each $\overrightarrow{A^jM^j}$ is orthogonal to $\mathcal{A}$. Consequently, the cloud $R^J$ consisting of points $R^j=G'+\overrightarrow{H^jM^j}$ (where $G'$, as above, is the barycenter of $H^J$) depends only on the choice of $\mathcal{H}$. $R^J$ is called the residual cloud, and it is constructed so that $G$ is its barycenter. This yields the orthogonal decomposition $$\overrightarrow{GM^j}=\overrightarrow{GA^j}+\overrightarrow{GR^j}.$$

It is straightforward to check that the variance of $R^J$ equals the residual square mean of $M^J$ with respect to $\mathcal{A}$. The distance formula, applied pointwise to the orthogonal decomposition, then yields the variance decomposition $${Var}M^J={Var}A^J+{Var}R^J$$ into the cloud's total variance, its fitted variance, and its residual variance.

cloud_decomposition(cloud = Target, subspace = axis)
cloud_variance_test(cloud = Target, weights = T_freq, subspace = axis)

3.2.4 Variables Attached to an Axis

The linear formulation in the next section relies on several coordinatization schemes. Begin with an axis $(G,\overrightarrow{\alpha})$, through the barycenter of $M^J$ in the direction of $\overrightarrow{\alpha}$, and the projected cloud $A^J$ onto this axis. We then define four centered (having mean point zero) variables:

alpha <- affine_decomposition(axis)$linear_subspace
coords <- sapply(c("covariant", "calibrated", "standard", "axial"),
                 axis_coordinates, cloud = Target, dir_vec = alpha)
print(coords)

3.2.5 Linear Formalization

The Euclidean space $\mathcal{U}$ has an underlying vector space $\mathcal{V}$. The aforedescribed coordinates can be understood in terms of linear mappings (homomorphisms) between $\mathcal{V}$ and two vector spaces derived from $J$: the space $\mathbb{R}^J$ of variables over $J$, and the space $\mathbb{R}_J$ of measures over $J$. The variable covariant ($\mathrm{Vac}$) homomorphism sends a vector $\overrightarrow{\alpha}\in\mathcal{V}$ to its covariant coordinates in $\mathbb{R}^J$; the effect ($\mathrm{Eff}$) homomorphism sends a measure (given as a vector of weights) in $\mathbb{R}_J$ to its evaluation on $M^J$ (the sum of the weighted points in the clouds, viewed as vectors). Both homomorphisms are taken with respect to a given point $P\in\mathcal{U}$, which by default takes the barycentric value $P=G$.

print(vac(cloud = Target, point = P, dir_vec = alpha))
print(eff(cloud = Target, point = P, coord_vec = coords[, "covariant"]))

The coordinate-preserving isomorphism $f_J:\mathbb{R}^J\to\mathbb{R}_J$, which associates variables with measures, reveals a duality between $\mathrm{Vac}$ and $\mathrm{Eff}$. Specifically, the composition homomorphism $\mathrm{Eff}^P\circ f_J$ is the adjoint $\mathrm{Vac}_P^\ast$ of $\mathrm{Vac}_P$. As these functions are implemented here, eff is the adjoint of vac. The result then follows directly from the assortativity of matrix multiplication.

These maps compose to produce symmetric endomorphisms $$\mathrm{Som}=\mathrm{Eff}^P\circ f_J\circ\mathrm{Vac}_P:\mathcal{V}\rightarrow\mathcal{V}$$ of the vector space containing $M^J$ and $$\mathrm{Tom}=\mathrm{Vac}_P\circ\mathrm{Eff}^P\circ f_J:\mathbb{R}^J\rightarrow\mathbb{R}^J$$ of the coordinate space of $J$.

print(rbind(alpha,
            som(cloud = Target, point = P, dir_vec = alpha)))
print(cbind(coords[, "covariant"],
            tom(cloud = Target, point = P, coord_vec = coords[, "covariant"])))

These endomorphisms will be important to performing the central operation of geometric data analysis: identifying the principal directions of a point cloud. In particular, this relies on the relationship between the endomorphism $\mathrm{Som}$ and the directional variance of $M^J$: $$\mathrm{Var}_{\overrightarrow{\alpha}}M^J=\frac{\|\mathrm{Vac}(\overrightarrow{\alpha})\|^2}{\|\overrightarrow{\alpha}\|^2}=\frac{\langle\mathrm{Som}(\overrightarrow{\alpha})\mid\overrightarrow{\alpha}\rangle}{\|\overrightarrow{\alpha}\|^2}$$ This property follows from the variable definitions and from the definition of $\mathrm{Som}$, which implies that $\langle\mathrm{Som}(\overrightarrow{\alpha})\mid\overrightarrow{\alpha}\rangle=\|\mathrm{Vac}(\overrightarrow{\alpha})\|^2$.

3.3 Principal Directions of a Cloud

The problem of principal directions is to approximate a point cloud (in a vector space of dimension, say, $L$) in a vector space of lower dimension (say, $L'$. This entails identifying successive directions in $\mathcal{V}$ so that the projected cloud onto each subspace orthogonal to the first $\ell-1$ directions has maximum fitted variance in the $\ell^\text{th}$ direction---or, equivalently (by the variance decomposition), so that each projection has minimum residual variance.

3.3.1 Principal Axes

As just observed, questions of directional variance can be recast in terms of $\mathrm{Som}$. Specifically, $\|\mathrm{Vac}(\overrightarrow{\alpha})\|/\|\overrightarrow{\alpha}\|$ is maximized in the direction of the first eigenvector of $\mathrm{Som}$---or, if the first eigenvalue is multiple, in any direction of the first eigenspace. This leads to the principal direction equation of the direction vectors of the principal lines of $M^J$ with the eigenvectors of $\mathrm{Som}$: $$\mathrm{Som}(\overrightarrow{\alpha}\ell)=\lambda\ell\overrightarrow{\alpha}_\ell$$

The proof proceeds inductively on $L$ and by construction along sequential dimension-one projections, with special attention given to the special case of eigenvalues of multiple order. These cases aside, the proof relies on one key observation: Let $A^J_1$ be the projected cloud along the first eigenvector of $\mathrm{Som}$, with residual cloud $R^J_1$. The orthogonal decomposition in Section 3.2.3 and the linearity of $\mathrm{Vac}$ yield the endomorphism decomposition $$\mathrm{Vac}=\mathrm{Vac}'+\mathrm{Vac}1,$$ where $\mathrm{Vac}'(\overrightarrow{\alpha})=(\langle\overrightarrow{GR^j_1}\mid\overrightarrow{\alpha}\rangle){j\in J}$ and $\mathrm{Vac}1(\overrightarrow{\alpha})=(\langle\overrightarrow{GA^j_1}\mid\overrightarrow{\alpha}\rangle){j\in J}$, as well as a similar decomposition of $\mathrm{Vac}^\ast$. These combine to give $$\mathrm{Som}=\mathrm{Som}'+\mathrm{Som}_1,$$ which, provided the nonzero eignvectors of $\mathrm{Som}$ are orthonormal, partitions these eigenvectors into those of $\mathrm{Som}'$ ($\ell-1$ of them) and those of $\mathrm{Som}_1$ (one). The single eigenvector of $\mathrm{Som}_1$ has the greatest eigenvalue, $\lambda_1$, of the original family, so the greatest eigenvalue of $\mathrm{Som}'$ is $\lambda_2$. From there the proof continues until all dimensions are exhausted.

Whereas each step identifies a principal vector orthogonal to the space spanned by what will be the remaining principal vectors, all of the principal vectors (hence the principal directions) are pairwise orthogonal. Additionally, since the relationship between the residual variance and the first eigenvalue is repeated at each step, the proof provides the principal breakdown of variance: $$\mathrm{Var}M^J=\sum_{\ell=1}^L\lambda_\ell$$

Finally, though the proof involved only variances of one-dimensional directions, the result can be extended to the hereditary property of higher-dimensional principal directions: If $\lambda_{L'}>\lambda_{L'-1}$ (strict inequality) for some $1\leq L'\leq L$, then the $first principal $L'$-dimensional direction contains all first principal lower-dimensional directions. This means that each first principal $L'$-dimensional direction is the span of the first $L'$ principal one-dimensional directions (or, in case of multiple eigenvalues, the first prinicple lowest-dimensional directions whose dimensions sum to $L'$).

3.3.2 Principal Variables



corybrunson/cloud documentation built on May 13, 2019, 10:51 p.m.