What is a distance metric learning algorithm?

A distance metric learning algorithm (DML) is an algorithm that can learn a similarity measure or distance from the data. This distance can be used for many purposes, such as improving distance based algorithms wither in supervised, semi-supervised or unsupervised learning. DMLs also have interesting applications in dimensionality reduction.

How to learn a distance

The (pseudo-)distances learned by distance metric learning algorithms are also known as Mahalanobis distances. This distances are determined by positive semidefinite matrices $M \in \mathcal{M}_d(\mathbb{R})$, and can be calculated as [ d(x,y) = \sqrt{(x-y)^TM(x-y)}, ] for $x, y \in \mathbb{R}^d$. It is known that the PSD matrix $M$ can be decomposed as $M = L^TL$, with $L \in \mathcal{M}_d(\mathbb{R})$ is an arbitrary matrix. In this case, we have [ d(x,y)^2 = (x-y)^TL^TL(x-y) = (L(x-y))^T(L(x-y)) = \|L(x-y)\|_2^2. ] So every Mahalanobis distance is equivalent to the euclidean distance after applying the linear mapping $L$.

Matrices $M$ and $L$ define the two approaches for learning a distance. We can either learn the metric matrix $M$ which defines the distance, or learn the linear map $L$, and calculate the distance in the mapped space. Each DML will learn the distance following one of these approaches.

Current algorithms

The current available algorithms are:

Additional functionalities

Examples

Get started with the following examples

See also

The pyDML software, which is the DML software used by rDML, and its documentation.

References



jlsuarezdiaz/rDML documentation built on May 24, 2019, 12:35 a.m.