knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The Mahalanobis-Taguchi System (MTS) helps you create a diagnostic system to detect abnormality. In MTS, we characterize a group of multivariate reference observations to establish the bounds for what "good" means, and use that characterization to diagnose new observations. In addition to diagnostics, MTS can also be used for classification and prediction -- but it's not a classifier per se. MTS doesn't attempt to split observations into multiple groups, it just tells you whether your new observation matches the data in your training set (or not).
Euclidean distance measures the straight-line distance between two points. In contrast, Mahalanobis distance is measured between a point and a distribution of values. It is thus a multivariate distance measure that describes how many standard deviations the point is away from the center of the "cloud" that forms the distribution. But if the variables are related to each other (for example, like how drunkness increases as number of drinks increases) then you might be counting the same impact multiple times. Mahalanobis distance provides a distance measure that's better, given that there are multiple variables to consider in determining that distance.
If the point can be described by its coordinates in n dimensions:
$$\vec{x} = (x_{1}, x_{2}, x_{3}, ... x_{n})$$
And the distribution has one mean for each independent variable:
$$\vec{\mu} = (\mu_{1}, \mu_{2}, \mu_{3}, ... \mu_{n})$$
The Mahalanobis Distance (MD) is calculated like this:
$$D(\vec{x}) = \sqrt{ (\vec{x}-\vec{\mu})^T\:S^{-1}\: (\vec{x}-\vec{\mu}) }$$
The initial term under the square root is the transposed matrix containing the differences between the x's and the column (independent variable) means. $S^{-1}$ indicates the inverse of the correlation matrix. Fortunately all of these things are easy to calculate in R, and there's also a function to generate MDs from a multivariate data frame.
This package uses the scaled Mahalanobis Distance recommended by Yang & Trewn (2004):
$$D(\vec{x})^2 = \frac{1}{p} { (\vec{x}-\vec{\mu})^T\:S^{-1}\: (\vec{x}-\vec{\mu}) }$$
The steps to apply MTS are:
Here is a quick example using the iris data. This only prepares and plots distances from a collection of good observations and a collection of bad observations. Each collection must have the same number of columns (predictors) but they can have a different number of rows (observations):
library(easyMTS) library(MASS) library(dplyr) library(magrittr) library(ggplot2) good <- iris[1:50,1:4] # Setosa are "healthy" group bad <- iris[51:150,1:4] # Virginica and versicolor are "unhealthy" mds <- computeMDs(good, bad) plotMDs(mds)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.