moda_full: Multimodal Oriented Discriminant Analysis (MODA) - Complete &...
In bbuchsbaum/discursive: What the Package Does (One Line, Title Case)

moda_full

R Documentation

Multimodal Oriented Discriminant Analysis (MODA) - Complete & Faithful Implementation

Description

Implements the full Multimodal Oriented Discriminant Analysis (MODA) framework as derived in De la Torre & Kanade (2005). This code:

Clusters each class into one or more clusters to capture multimodal structure.
Approximates each cluster's covariance as U_i \Lambda_i U_i^T + \sigma_i^2 I to handle high-dimensional data (Section 6 of the paper).
Constructs the majorization function L(\mathbf{B}) that upper-bounds the Kullback–Leibler divergence-based objective G(\mathbf{B}) (Equations (7)-(8)).
Iterates using the gradient-based solution to minimize E_5(\mathbf{B}) (Equation (10)) with updates from Equation (11) (i.e., normalized gradient descent or line search).

It does not merely provide a starter approach; instead, it faithfully implements the steps described in the paper, including references to Equations (7)–(11).

Usage

moda_full(
  X,
  y,
  k,
  numClusters = 1,
  pcaFirst = TRUE,
  pcaVar = 0.95,
  maxIter = 50,
  tol = 1e-05,
  clusterMethod = "kmeans",
  B_init = "random",
  verbose = FALSE,
  lineSearchIter = 20,
  B_init_sd = 0.01
)

Arguments

`X`	A numeric matrix of size `d \times n`, where each column is a data sample.
`y`	A vector (length `n`) of integer or factor class labels (must have \(\geq 2\) distinct labels).
`k`	Integer. Dimensionality of the target subspace (number of features to extract).
`numClusters`	Integer or vector/list specifying #clusters per class. If 1, it's ODA.
`pcaFirst`	Logical. If TRUE, run PCA first to reduce dimension if `d >> n`. Defaults to TRUE.
`pcaVar`	Fraction of variance to keep if `pcaFirst=TRUE`. Defaults to 0.95.
`maxIter`	Maximum number of majorization iterations. Defaults to 50.
`tol`	Convergence tolerance on relative change in the objective `G(\mathbf{B})`. Defaults to 1e-5.
`clusterMethod`	Either `"kmeans"` or a custom function accepting (dataMatrix, kC) -> clusterIDs.
`B_init`	Either `"random"` or `"pca"` to initialize the projection matrix \(\mathbfB\).
`verbose`	If TRUE, prints iteration progress.
`lineSearchIter`	Number of line search iterations for step size selection (default = 20).
`B_init_sd`	Standard deviation for the random initialization of \(\mathbfB\) if `B_init="random"`. Defaults to 1e-2.

Details

Key Steps:

Clustering (Section 4): For each class, optionally split the samples into multiple clusters to model multimodality.
Approximate Covariances (Section 6): For each cluster, approximate \(\Sigma_i\) by \(\mathbfU_i \boldsymbol\Lambda_i \mathbfU_i^T + \sigma_i^2 \mathbfI\).
Majorization (Sections 5.1–5.2): Build L(\mathbf{B}) from G(\mathbf{B}) using Equation (7) and sum up to get Equation (8).
Iterative Minimization of L(\mathbf{B}) \(\geq G(\mathbfB)\). The partial derivatives (Equation (9)) yield a system of linear equations, solved here by gradient-based updates (Equations (10)–(11)).

High-Dimensional Data: When d \gg n, it is recommended to set pcaFirst=TRUE so that the dimension is reduced to at most n, avoiding rank deficiency and improving generalization.

Classification after MODA: Once B is learned, map a new sample \mathbf{x} to \mathbf{B}^T \mathbf{x} (plus PCA if used) and classify in that lower-dimensional space.

For further details, see:

De la Torre & Kanade (2005). "Multimodal Oriented Discriminant Analysis."
Equations (7)–(11) for the majorization steps.
Section 6 for the covariance factorization in high dimensions.

Value

A list with elements:

B: A d' \times k matrix (or d \times k if no PCA) with the learned projection.
objVals: The values of the objective G(\mathbf{B}) at each iteration.
clusters: The cluster assignments (per class).
pcaInfo: If PCA was applied, contains the PCA rotation U and mean.

References to the Paper

Equation (7): Inequality used to construct the majorization function.
Equation (8): Definition of L(\mathbf{B}) that majorizes G(\mathbf{B}).
Equation (9): Necessary condition for the minimum of L(\mathbf{B}).
Equation (10): Definition of E_5(\mathbf{B}) to be minimized via gradient methods.
Equation (11): Normalized gradient-descent update with step size \eta chosen to minimize E_5.

Examples

# Synthetic example (small scale):
set.seed(123)
d <- 20; n <- 40
X <- matrix(rnorm(d*n), nrow = d, ncol = n)
y <- rep(1:2, each = n/2)
res <- moda_full(X, y, k = 2, numClusters = 1, pcaFirst = FALSE, maxIter = 15, verbose = TRUE)

# Inspect the learned projection B
str(res)

bbuchsbaum/discursive documentation built on April 14, 2025, 4:57 p.m.