README.md

easyMTS

The Mahalanobis-Taguchi System (MTS) helps you create a diagnostic system to detect abnormality. In MTS, we characterize a group of multivariate reference observations to establish the bounds for what “good” means, and use that characterization to diagnose new observations. In addition to diagnostics, MTS can also be used for classification and prediction – but it’s not a classifier per se. MTS doesn’t attempt to split observations into multiple groups, it just tells you whether your new observation matches the data in your training set (or not).

Mahalanobis Distance

Euclidean distance measures the straight-line distance between two points. In contrast, Mahalanobis distance is measured between a point and a distribution of values. It is thus a multivariate distance measure that describes how many standard deviations the point is away from the center of the “cloud” that forms the distribution. But if the variables are related to each other (for example, like how drunkness increases as number of drinks increases) then you might be counting the same impact multiple times. Mahalanobis distance provides a distance measure that’s better, given that there are multiple variables to consider in determining that distance.

If the point can be described by its coordinates in n dimensions:

\vec{x} = (x_{1}, x_{2}, x_{3}, ...
x_{n})

And the distribution has one mean for each independent variable:

\vec{\mu} = (\mu_{1}, \mu_{2}, \mu_{3}, ...
\mu_{n})

The Mahalanobis Distance (MD) is calculated like this:

D(\vec{x}) = \sqrt{ (\vec{x}-\vec{\mu})^T\:S^{-1}\:
(\vec{x}-\vec{\mu})
}

The initial term under the square root is the transposed matrix containing the differences between the x’s and the column (independent variable) means. S^{-1} indicates the inverse of the correlation matrix. Fortunately all of these things are easy to calculate in R, and there’s also a function to generate MDs from a multivariate data frame.

This package uses the scaled Mahalanobis Distance recommended by Yang & Trewn (2004):

D(\vec{x})^2 = \frac{1}{p} { (\vec{x}-\vec{\mu})^T\:S^{-1}\:
(\vec{x}-\vec{\mu})
}

Steps in MTS Development and Validation

The steps to apply MTS are:

Functions in this Package

Example

Here is a quick example using the iris data. This only prepares and plots distances from a collection of good observations and a collection of bad observations. Each collection must have the same number of columns (predictors) but they can have a different number of rows (observations):

library(easyMTS)
library(MASS)
library(dplyr)
library(magrittr)
library(ggplot2)

good <- iris[1:50,1:4]    # Setosa are "healthy" group
bad  <- iris[51:150,1:4]  # Virginica and versicolor are "unhealthy"

mds <- computeMDs(good, bad)
plotMDs(mds)



NicoleRadziwill/easyMTS documentation built on Oct. 30, 2019, 10:14 p.m.