moda_full: Multimodal Oriented Discriminant Analysis (MODA) - Complete &...

View source: R/moda.R

moda_fullR Documentation

Multimodal Oriented Discriminant Analysis (MODA) - Complete & Faithful Implementation

Description

Implements the full Multimodal Oriented Discriminant Analysis (MODA) framework as derived in De la Torre & Kanade (2005). This code:

  1. Clusters each class into one or more clusters to capture multimodal structure.

  2. Approximates each cluster's covariance as U_i \Lambda_i U_i^T + \sigma_i^2 I to handle high-dimensional data (Section 6 of the paper).

  3. Constructs the majorization function L(\mathbf{B}) that upper-bounds the Kullback–Leibler divergence-based objective G(\mathbf{B}) (Equations (7)-(8)).

  4. Iterates using the gradient-based solution to minimize E_5(\mathbf{B}) (Equation (10)) with updates from Equation (11) (i.e., normalized gradient descent or line search).

It does not merely provide a starter approach; instead, it faithfully implements the steps described in the paper, including references to Equations (7)–(11).

Usage

moda_full(
  X,
  y,
  k,
  numClusters = 1,
  pcaFirst = TRUE,
  pcaVar = 0.95,
  maxIter = 50,
  tol = 1e-05,
  clusterMethod = "kmeans",
  B_init = "random",
  verbose = FALSE,
  lineSearchIter = 20,
  B_init_sd = 0.01
)

Arguments

X

A numeric matrix of size d \times n, where each column is a data sample.

y

A vector (length n) of integer or factor class labels (must have \(\geq 2\) distinct labels).

k

Integer. Dimensionality of the target subspace (number of features to extract).

numClusters

Integer or vector/list specifying #clusters per class. If 1, it's ODA.

pcaFirst

Logical. If TRUE, run PCA first to reduce dimension if d >> n. Defaults to TRUE.

pcaVar

Fraction of variance to keep if pcaFirst=TRUE. Defaults to 0.95.

maxIter

Maximum number of majorization iterations. Defaults to 50.

tol

Convergence tolerance on relative change in the objective G(\mathbf{B}). Defaults to 1e-5.

clusterMethod

Either "kmeans" or a custom function accepting (dataMatrix, kC) -> clusterIDs.

B_init

Either "random" or "pca" to initialize the projection matrix \(\mathbfB\).

verbose

If TRUE, prints iteration progress.

lineSearchIter

Number of line search iterations for step size selection (default = 20).

B_init_sd

Standard deviation for the random initialization of \(\mathbfB\) if B_init="random". Defaults to 1e-2.

Details

Key Steps:

  1. Clustering (Section 4): For each class, optionally split the samples into multiple clusters to model multimodality.

  2. Approximate Covariances (Section 6): For each cluster, approximate \(\Sigma_i\) by \(\mathbfU_i \boldsymbol\Lambda_i \mathbfU_i^T + \sigma_i^2 \mathbfI\).

  3. Majorization (Sections 5.1–5.2): Build L(\mathbf{B}) from G(\mathbf{B}) using Equation (7) and sum up to get Equation (8).

  4. Iterative Minimization of L(\mathbf{B}) \(\geq G(\mathbfB)\). The partial derivatives (Equation (9)) yield a system of linear equations, solved here by gradient-based updates (Equations (10)–(11)).

High-Dimensional Data: When d \gg n, it is recommended to set pcaFirst=TRUE so that the dimension is reduced to at most n, avoiding rank deficiency and improving generalization.

Classification after MODA: Once B is learned, map a new sample \mathbf{x} to \mathbf{B}^T \mathbf{x} (plus PCA if used) and classify in that lower-dimensional space.

For further details, see:

  1. De la Torre & Kanade (2005). "Multimodal Oriented Discriminant Analysis."

  2. Equations (7)–(11) for the majorization steps.

  3. Section 6 for the covariance factorization in high dimensions.

Value

A list with elements:

  • B: A d' \times k matrix (or d \times k if no PCA) with the learned projection.

  • objVals: The values of the objective G(\mathbf{B}) at each iteration.

  • clusters: The cluster assignments (per class).

  • pcaInfo: If PCA was applied, contains the PCA rotation U and mean.

References to the Paper

  • Equation (7): Inequality used to construct the majorization function.

  • Equation (8): Definition of L(\mathbf{B}) that majorizes G(\mathbf{B}).

  • Equation (9): Necessary condition for the minimum of L(\mathbf{B}).

  • Equation (10): Definition of E_5(\mathbf{B}) to be minimized via gradient methods.

  • Equation (11): Normalized gradient-descent update with step size \eta chosen to minimize E_5.

Examples

# Synthetic example (small scale):
set.seed(123)
d <- 20; n <- 40
X <- matrix(rnorm(d*n), nrow = d, ncol = n)
y <- rep(1:2, each = n/2)
res <- moda_full(X, y, k = 2, numClusters = 1, pcaFirst = FALSE, maxIter = 15, verbose = TRUE)

# Inspect the learned projection B
str(res)


bbuchsbaum/discursive documentation built on April 14, 2025, 4:57 p.m.