This vignette follows the treatment in Chapter 3 of Le Roux and Rouanet's *Geometric Data Analysis*, with drastically simplified exposition but employing the same example data.

library(devtools) devel = TRUE if (devel) { load_all("~/Documents/edu/R/cloud") } else { install_github("corybrunson/cloud") library(cloud) } library(knitr) opts_chunk$set(echo=FALSE, results=FALSE, message=FALSE, warning=FALSE)

The central object of GDA is a point cloud in a Euclidean space, which encodes a set of statistical observations arranged in a table, or two-dimensional array (for instance, a contingency table in Correspondence Analysis or an Individuals--Variables table in Principal Components Analysis). By convention, the columns of the table correspond to the coordinate axes of the Euclidean space while the rows correspond to the points.

Rigorously, a *point cloud* in a Euclidean space $\mathcal{U}$ consists of a set $J$ of labels and a mapping $M^J:J\to\mathcal{U}$ that takes each label $j\in J$ to its corresponding point $M^j\in\mathcal{U}$. The points are assigned positive weights $\omega_j$, which may be assumed unitary when not specified. The most common weights will be absolute frequencies $n_j$, $n=\sum_{j\in J}n_j$, and their associated relative frequencies $f_j=\frac{n_j}{n}$.

As a running example, we load the *Target example* dataset, used throughout the chapter, whose coordinates are provided in Exercise 3.4:

data(Target) print(Target) print(class(Target)) T_freq <- rep(1 / nrow(Target), nrow(Target))

plot(xlim = c(-19, 19), ylim = c(-19, 19), axes = FALSE, asp = 1, Target, pch = 16, xlab = "", ylab = "") text(Target, labels = rownames(Target), pos = 1) rect(xleft = -19, xright = 19, ybottom = -19, ytop = 19, lty = 3) legend(x = -19, y = 19, box.lty = 0, legend = "Target example")

The labels consist of $j_1$ through $j_{10}$. Because these data occupy two dimensions, i.e. the labels are mapped into $\mathcal{U}=\mathbb{R}^2$, they constitute a *plane cloud*.

The *mean point* of the cloud $M^J$ in $\mathcal{U}$ is meant to be a proxy point location for the entire cloud. Given an arbitrary point $P\in\mathcal{U}$, the mean point is defined as the point arrived to from $P$ via the sum of the weighted vectors $f_j\overrightarrow{PM^j}$ from $P$ to each point $M^j$---that is, as the (unique!) point $G\in\mathcal{U}$ for which $$\overrightarrow{PG}=\sum_{j\in J}f_j\overrightarrow{PM^j}$$ doesn't depend on the choice of point $P\in\mathcal{U}$. The mean point satisfies the *Barycentric property* that $$\sum_{j\in J}f_j\overrightarrow{GM^j}=\overrightarrow{0},$$ which can be checked by substituting $P=G$ in the definition.[^1]

G <- barycenter(Target) print(G)

# Vectors from barycenter to each point T_vectors <- Target - do.call(rbind, replicate(nrow(Target), G, simplify = FALSE)) # Plot barycenter and vectors therefrom to points in cloud plot(xlim = c(-19, 19), ylim = c(-19, 19), axes = FALSE, asp = 1, x = c(), y = c(), xlab = "", ylab = "") text(G, labels = "G", adj = c(1.5, 1.25)) for (i in 1:nrow(Target)) { text(Target[i, , drop = FALSE], labels = rownames(Target)[i], adj = .5 - (T_vectors[i, ] / sqrt(sum(T_vectors[i, ] ^ 2)))) } arrows(x0 = G[, 1], y0 = G[, 2], x1 = Target[, 1], y1 = Target[, 2], length = .15, angle = 15) rect(xleft = -19, xright = 19, ybottom = -19, ytop = 19, lty = 3) legend(x = -19, y = 19, box.lty = 0, legend = "Target: barycentric characterization")

[^1]: These and other formulas are given in more general terms in *GDA*, in which the points $M^j$ are weighted by masses $\omega_j$. I may be forced to adopt this generality, but i haven't yet.

The *inertia* of a cloud $M^J$ with respect to a point $P$ is defined, as in other settings, as the sum-squared distance of each point $M^j$ from $P$: $${In}^PM^J=\sum_{j\in J}{In}*j^P=\sum*{j\in J}\omega_j(PM^j)^2.$$ The inertia for $\omega_j=n_j$ is the *sum of squares* and that for $\omega_j=f_j$ is the *mean of squares*. The *variance* of the cloud is its mean of squares from its barycenter, and can be interpreted as the sum of the *absolute contributions* of the individual points in the cloud: $${Var}M^J=\sum_{j\in J}{Cta}*j=\sum*{j\in J}f_j(GM^j)^2.$$ The *relative contributions of the points are, then, the fractions ${Ctr}_j=\frac{{Cta}_j}{{Var}M^J}$.

P <- c(0, 0) P_in <- inertia(point = P, cloud = Target) G_in <- inertia(point = G, cloud = Target) T_var <- inertia(G, cloud = Target, weights = T_freq) print(c(center_inertia = P_in, barycenter_inertia = G_in, variance = T_var))

Ctas <- T_freq * apply(Target, 1, function(m) { sum((m - G) ^ 2) }) Ctrs <- Ctas / T_var print(cbind(Ctas, Ctrs))

The **first Huyghen's theorem** states that the inertia of a cloud $M^J$ with respect to a point $P$ can be decomposed into the sum of the variance of the cloud (its inertia with respect to its barycenter $G$) and the squared distance between $G$ and $P$: $$\sum_{j\in J}f_j(PM^j)^2=(PG)^2+\sum_{j\in J}f_j(GM^j)^2.$$ (See the illustration below.) A consequence is the *metric characterization of the barycenter*, which has that the barycenter minimizes the mean of squares of the cloud. That is, across $P\in\mathcal{U}$, the quantity $\sum_{j\in J}f_j(PM^j)^2$ is minimized when $P=G$.

huyghen_test(point = P, cloud = Target, weights = T_freq) huyghen_test(point = G, cloud = Target, weights = T_freq)

# Three plots! par(mfrow = c(1, 3)) # Plot points relative to center plot(xlim = c(-19, 19), ylim = c(-19, 19), axes = FALSE, asp = 1, Target, pch = 16, xlab = "", ylab = "") segments(x0 = 0, y0 = 0, x1 = Target[, 1], y1 = Target[, 2]) text(x = P[1], y = P[2], labels = "P", adj = c(sqrt(2), sqrt(2))) rect(xleft = -19, xright = 19, ybottom = -19, ytop = 19, lty = 3) # Plot center relative to barycenter plot(xlim = c(-19, 19), ylim = c(-19, 19), axes = FALSE, asp = 1, x = c(), y = c(), xlab = "", ylab = "") points(rbind(P, G), pch = c(13, 21), bg = c("white", "grey"), cex = 2) segments(x0 = P[1], y0 = P[2], x1 = G[1], y1 = G[2]) text(x = rbind(P, G), labels = c("P", "G"), adj = c(1, -1)) rect(xleft = -19, xright = 19, ybottom = -19, ytop = 19, lty = 3) # Plot points relative to barycenter plot(xlim = c(-19, 19), ylim = c(-19, 19), axes = FALSE, asp = 1, Target, pch = 16, xlab = "", ylab = "") segments(x0 = G[1], y0 = G[2], x1 = Target[, 1], y1 = Target[, 2]) text(x = G[1], y = G[2], labels = "G", adj = c(sqrt(3), 1)) rect(xleft = -19, xright = 19, ybottom = -19, ytop = 19, lty = 3) # Reset par(mfrow = c(1, 1))

The *(orthogonal) projection* of a point cloud $M^J$ in a space $\mathcal{U}$ onto a subspace $\mathcal{H}$ is the cloud $H^J={H^j}_{j\in J}$ consisting of its points' projections, and called the *projected cloud*. An important property of the projected cloud is that its barycenter $G'$ is the projection of the barycenter $G$ of $M^J$.

axis <- rbind(c(0, 0), c(2, 1)) print(affine_projection(cloud = Target, subspace = axis)) projection_test(cloud = Target, subspace = axis)

The variances of the projected clouds of $M^J$ onto two equidimensional parallel subspaces $\mathcal{H}$ and $\mathcal{H}'$ are equal, and are called the *variance in the direction* of $\mathcal{H}$ (or of $\mathcal{H}'$), here denoted $\mathrm{Var}*\mathcal{H}M^J$. A special case is the variance in the direction of a line, equal to $$\mathrm{Var}*{\overrightarrow{\alpha}}M^J=\sum_{j\in J}f_j\frac{\langle\overrightarrow{GM}^j\mid\overrightarrow{\alpha}\rangle^2}{\|\overrightarrow{\alpha}\|^2}$$ and called the *variance of axis*.

axis_var <- direction_variance(cloud = Target, weights = T_freq, subspace = axis) print(axis_var) print(direction_variance(cloud = Target, weights = T_freq, subspace = affine_decomposition(axis)$linear_subspace, type = "linear"))

# Three plots! par(mfrow = c(1, 2)) # Plot cloud, axis, and residual segments plot(xlim = c(-19, 19), ylim = c(-19, 19), axes = FALSE, asp = 1, Target, pch = 16, xlab = "", ylab = "") points(x = P[1], y = P[2], pch = 13, cex = 2) m <- diff(axis[, 2]) / diff(axis[, 1]) abline(a = axis[1, 2] - m * axis[1, 1], b = m, lty = 2) rect(xleft = -19, xright = 19, ybottom = -19, ytop = 19, lty = 3) # Plot center relative to barycenter plot(xlim = c(-19, 19), ylim = c(-19, 19), axes = FALSE, asp = 1, x = c(), y = c(), xlab = "", ylab = "") points(rbind(P, G), pch = c(13, 21), bg = c("white", "grey"), cex = 2) segments(x0 = P[1], y0 = P[2], x1 = G[1], y1 = G[2]) text(x = rbind(P, G), labels = c("P", "G"), adj = c(1, -1)) rect(xleft = -19, xright = 19, ybottom = -19, ytop = 19, lty = 3) # Plot cloud, new axis, and new residual segments # Reset par(mfrow = c(1, 1))

The first Huyghen's theorem can be extended from "point" variance to directional variance by way of projection: Given a cloud $M^J$ and a subspace $\mathcal{H}$, the *residual square mean* $$\sum_{j\in J}f_j(M^jH^j)^2$$ of $M^J$ with respect to $\mathcal{H}$ is the weighted mean of the squares of the "residuals" $M^jH^j$---the distances from the points of $M^J$ to their projections in $H^J$. The **general Huyghen's theorem** then states that the residual square mean with respect to $\mathcal{H}$ can be decomposed into the sum of the residual square mean with respect to a same-dimensional parallel space $\mathcal{H}'$ through the barycenter $G$ of $M^J$ and the squared distance from $G$ to $\mathcal{H}$: $$\sum_{j\in J}f_j(M^jH^j)^2=\sum_{j\in J}f_j(M^jA^j)^2+(GG')^2,$$ where $G'$ is the projection of $G$ onto $\mathcal{H}$.

huyghen_general_test(cloud = Target, weights = T_freq, subspace = axis)

When a projected cloud $A^J$ (in a subspace $\mathcal{A}$) is used to model its preimage $M^J$, it is called a *fitted cloud*, and the vectors $\overrightarrow{A^jM^j}$, whose lengths are the residuals mentioned above, are called *residual deviations*. This leads to the geometric data--model--error formulation $$\forall j\in J\ \ M^j=A^j+\overrightarrow{A^jM^j},$$ where each $\overrightarrow{A^jM^j}$ is orthogonal to $\mathcal{A}$. Consequently, the cloud $R^J$ consisting of points $R^j=G'+\overrightarrow{H^jM^j}$ (where $G'$, as above, is the barycenter of $H^J$) depends only on the choice of $\mathcal{H}$. $R^J$ is called the *residual cloud*, and it is constructed so that $G$ is its barycenter. This yields the orthogonal decomposition $$\overrightarrow{GM^j}=\overrightarrow{GA^j}+\overrightarrow{GR^j}.$$

It is straightforward to check that the variance of $R^J$ equals the residual square mean of $M^J$ with respect to $\mathcal{A}$. The distance formula, applied pointwise to the orthogonal decomposition, then yields the *variance decomposition* $${Var}M^J={Var}A^J+{Var}R^J$$ into the cloud's *total variance*, its *fitted variance*, and its *residual variance*.

cloud_decomposition(cloud = Target, subspace = axis) cloud_variance_test(cloud = Target, weights = T_freq, subspace = axis)

The linear formulation in the next section relies on several coordinatization schemes. Begin with an axis $(G,\overrightarrow{\alpha})$, through the barycenter of $M^J$ in the direction of $\overrightarrow{\alpha}$, and the projected cloud $A^J$ onto this axis. We then define four centered (having mean point zero) variables:

- The
*covariant variable*$\alpha^J=(\alpha^j)_{j\in J}$ has*covariant coordinates*$\alpha^j=\langle\overrightarrow{GM^j}\mid\overrightarrow{\alpha}\rangle$ and variance ${Var}A^J\|\overrightarrow{\alpha}\|^2$. The term refers to the fact that these coordinates scale with $\overrightarrow{\alpha}$. - The
*calibrated variable*$y^J=(y^j)_{j\in J}$ has*calibrated coordinates*$y^j=\alpha^j/\|\overrightarrow{\alpha}\|$ and variance ${Var}A^J$. - The
*standard variable*$z^J=(z^j)_{j\in J}$ has*standard coordinates*$z^j=y^j/SDy^J$ and variance 1. - The
*axial variable*$t^J=(t^j)_{j\in J}$ has*axial coordinates*$t^j=\langle\overrightarrow{GM^j}\mid\overrightarrow{\alpha}\rangle/\|\overrightarrow{\alpha}\|^2$ and variance ${Var}t^J=A^J/\|\overrightarrow{\alpha}\|^2$. The $t^j$ are the projections---*GDA*calls them the coordinates---of the $A^j$ onto $(G,\overrightarrow{\alpha})$.

alpha <- affine_decomposition(axis)$linear_subspace coords <- sapply(c("covariant", "calibrated", "standard", "axial"), axis_coordinates, cloud = Target, dir_vec = alpha) print(coords)

The Euclidean space $\mathcal{U}$ has an underlying vector space $\mathcal{V}$. The aforedescribed coordinates can be understood in terms of linear mappings (homomorphisms) between $\mathcal{V}$ and two vector spaces derived from $J$: the space $\mathbb{R}^J$ of variables over $J$, and the space $\mathbb{R}_J$ of measures over $J$. The *variable covariant* ($\mathrm{Vac}$) homomorphism sends a vector $\overrightarrow{\alpha}\in\mathcal{V}$ to its covariant coordinates in $\mathbb{R}^J$; the *effect* ($\mathrm{Eff}$) homomorphism sends a measure (given as a vector of weights) in $\mathbb{R}_J$ to its evaluation on $M^J$ (the sum of the weighted points in the clouds, viewed as vectors). Both homomorphisms are taken with respect to a given point $P\in\mathcal{U}$, which by default takes the barycentric value $P=G$.

print(vac(cloud = Target, point = P, dir_vec = alpha)) print(eff(cloud = Target, point = P, coord_vec = coords[, "covariant"]))

The coordinate-preserving isomorphism $f_J:\mathbb{R}^J\to\mathbb{R}_J$, which associates variables with measures, reveals a duality between $\mathrm{Vac}$ and $\mathrm{Eff}$. Specifically, the composition homomorphism $\mathrm{Eff}^P\circ f_J$ is the adjoint $\mathrm{Vac}_P^\ast$ of $\mathrm{Vac}_P$. As these functions are implemented here, `eff`

*is* the adjoint of `vac`

. The result then follows directly from the assortativity of matrix multiplication.

These maps compose to produce symmetric endomorphisms $$\mathrm{Som}=\mathrm{Eff}^P\circ f_J\circ\mathrm{Vac}_P:\mathcal{V}\rightarrow\mathcal{V}$$ of the vector space containing $M^J$ and $$\mathrm{Tom}=\mathrm{Vac}_P\circ\mathrm{Eff}^P\circ f_J:\mathbb{R}^J\rightarrow\mathbb{R}^J$$ of the coordinate space of $J$.

print(rbind(alpha, som(cloud = Target, point = P, dir_vec = alpha))) print(cbind(coords[, "covariant"], tom(cloud = Target, point = P, coord_vec = coords[, "covariant"])))

These endomorphisms will be important to performing the central operation of geometric data analysis: identifying the principal directions of a point cloud. In particular, this relies on the relationship between the endomorphism $\mathrm{Som}$ and the directional variance of $M^J$: $$\mathrm{Var}_{\overrightarrow{\alpha}}M^J=\frac{\|\mathrm{Vac}(\overrightarrow{\alpha})\|^2}{\|\overrightarrow{\alpha}\|^2}=\frac{\langle\mathrm{Som}(\overrightarrow{\alpha})\mid\overrightarrow{\alpha}\rangle}{\|\overrightarrow{\alpha}\|^2}$$ This property follows from the variable definitions and from the definition of $\mathrm{Som}$, which implies that $\langle\mathrm{Som}(\overrightarrow{\alpha})\mid\overrightarrow{\alpha}\rangle=\|\mathrm{Vac}(\overrightarrow{\alpha})\|^2$.

The **problem of principal directions** is to approximate a point cloud (in a vector space of dimension, say, $L$) in a vector space of lower dimension (say, $L'$. This entails identifying successive directions in $\mathcal{V}$ so that the projected cloud onto each subspace orthogonal to the first $\ell-1$ directions has maximum fitted variance in the $\ell^\text{th}$ direction---or, equivalently (by the variance decomposition), so that each projection has minimum residual variance.

As just observed, questions of directional variance can be recast in terms of $\mathrm{Som}$. Specifically, $\|\mathrm{Vac}(\overrightarrow{\alpha})\|/\|\overrightarrow{\alpha}\|$ is maximized in the direction of the first eigenvector of $\mathrm{Som}$---or, if the first eigenvalue is multiple, in any direction of the first eigenspace. This leads to the **principal direction equation** of the direction vectors of the principal lines of $M^J$ with the eigenvectors of $\mathrm{Som}$: $$\mathrm{Som}(\overrightarrow{\alpha}*\ell)=\lambda*\ell\overrightarrow{\alpha}_\ell$$

The proof proceeds inductively on $L$ and by construction along sequential dimension-one projections, with special attention given to the special case of eigenvalues of multiple order. These cases aside, the proof relies on one key observation: Let $A^J_1$ be the projected cloud along the first eigenvector of $\mathrm{Som}$, with residual cloud $R^J_1$. The orthogonal decomposition in Section 3.2.3 and the linearity of $\mathrm{Vac}$ yield the endomorphism decomposition $$\mathrm{Vac}=\mathrm{Vac}'+\mathrm{Vac}*1,$$ where $\mathrm{Vac}'(\overrightarrow{\alpha})=(\langle\overrightarrow{GR^j_1}\mid\overrightarrow{\alpha}\rangle)*{j\in J}$ and $\mathrm{Vac}*1(\overrightarrow{\alpha})=(\langle\overrightarrow{GA^j_1}\mid\overrightarrow{\alpha}\rangle)*{j\in J}$, as well as a similar decomposition of $\mathrm{Vac}^\ast$. These combine to give $$\mathrm{Som}=\mathrm{Som}'+\mathrm{Som}_1,$$ which, provided the nonzero eignvectors of $\mathrm{Som}$ are orthonormal, partitions these eigenvectors into those of $\mathrm{Som}'$ ($\ell-1$ of them) and those of $\mathrm{Som}_1$ (one). The single eigenvector of $\mathrm{Som}_1$ has the greatest eigenvalue, $\lambda_1$, of the original family, so the greatest eigenvalue of $\mathrm{Som}'$ is $\lambda_2$. From there the proof continues until all dimensions are exhausted.

Whereas each step identifies a principal vector orthogonal to the space spanned by what will be the remaining principal vectors, *all of the principal vectors (hence the principal directions) are pairwise orthogonal*. Additionally, since the relationship between the residual variance and the first eigenvalue is repeated at each step, the proof provides the **principal breakdown of variance**: $$\mathrm{Var}M^J=\sum_{\ell=1}^L\lambda_\ell$$

Finally, though the proof involved only variances of one-dimensional directions, the result can be extended to the **hereditary property** of higher-dimensional principal directions: If $\lambda_{L'}>\lambda_{L'-1}$ (strict inequality) for some $1\leq L'\leq L$, then the **$first principal $L'$-dimensional direction** contains all first principal lower-dimensional directions. This means that each first principal $L'$-dimensional direction is the span of the first $L'$ principal one-dimensional directions (or, in case of multiple eigenvalues, the first prinicple lowest-dimensional directions whose dimensions sum to $L'$).

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.