Distance measure using Mahalanobis distance for outlier detection

Description

Implements Mahalanobis distance measure for outlier detection. In addition to the basic distance measure, boxplots are provided with potential outlier(s) to give an insight into the early stage of data cleansing task.

Usage

1
dm.mahalanobis(data, from="median", p=10, plot=FALSE, v.index=NULL, layout=NULL)

Arguments

data

Dataframe

from

Datum point from which the distance is measured
"mean" Mean of each column
"median" Median of each column (default)

p

Percentage to which outlier point(s) is noted (default of 10)

plot

Switch for boxplot(s)

v.index

Numeric vector indicating column(s) to be printed in the boxplot. Default value of NULL will present all.

layout

Numeric vector indicating dimension of boxplots. Default value of NULL will find an optimal layout.

Value

$dist

Mahalanobis distance from from

$excluded

Excluded row(s) in row number

$order

Distance order (decreasing) in row number

$suspect

Potential outlier(s) in row number

Author(s)

Dong-Joon Lim, PhD

References

Hair, Joseph F., et al. Multivariate data analysis. Vol. 7. Upper Saddle River, NJ: Pearson Prentice Hall, 2006.

Examples

1
2
3
4
5
# Generate a sample dataframe
df <- data.frame(replicate(6, sample(0 : 100, 50)))

# go
dm.mahalanobis(df, plot = TRUE)