maha: Outlier detection using Mahalanobis Distance

Description Usage Arguments Details Value Author(s) References Examples

Description

Takes a dataset and finds its outliers using modelbased method

Usage

1
maha(x, cutoff = 0.95, rnames = FALSE)

Arguments

x

dataset for which outliers are to be found

cutoff

Percentile threshold used for distance, default value is 0.95

rnames

Logical value indicating whether the dataset has rownames, default value is False

Details

maha computes Mahalanibis distance an observation and based on the Chi square cutoff, labels an observation as outlier. Outlierliness of the labelled 'Outlier' is also reported based on its p values. For bivariate data, it also shows the scatterplot of the data with labelled outliers.

Value

Outlier Observations: A matrix of outlier observations

Location of Outlier: vector of Sr. no. of outliers

Outlier probability: vector of (1-p value) of outlier observations

Author(s)

Vinay Tiwari, Akanksha Kashikar

References

Barnett, V. 1978. The study of outliers: purpose and model. Applied Statistics, 27(3), 242–250.

Examples

1
2
3
4
#Create dataset
X=iris[,1:4]
#Outlier detection
maha(X,cutoff=0.9)

Example output

sh: 1: cannot create /dev/null: Permission denied
sh: 1: cannot create /dev/null: Permission denied
Warning messages:
1: In rgl.init(initValue, onlyNULL) : RGL: unable to open X11 display
2: 'rgl.init' failed, running with 'rgl.useNULL = TRUE'. 
$`Outlier Observations`
    Sepal.Length Sepal.Width Petal.Length Petal.Width
15           5.8         4.0          1.2         0.2
16           5.7         4.4          1.5         0.4
33           5.2         4.1          1.5         0.1
42           4.5         2.3          1.3         0.3
101          6.3         3.3          6.0         2.5
107          4.9         2.5          4.5         1.7
115          5.8         2.8          5.1         2.4
118          7.7         3.8          6.7         2.2
123          7.7         2.8          6.7         2.0
132          7.9         3.8          6.4         2.0
135          6.1         2.6          5.6         1.4
136          7.7         3.0          6.1         2.3
137          6.3         3.4          5.6         2.4
142          6.9         3.1          5.1         2.3
146          6.7         3.0          5.2         2.3

$`Location of Outlier`
 [1]  15  16  33  42 101 107 115 118 123 132 135 136 137 142 146

$`Outlier Probability`
 [1] 0.9319942 0.9544462 0.9207007 0.9778100 0.9373729 0.9618307 0.9776826
 [8] 0.9877738 0.9336297 0.9892077 0.9881244 0.9533794 0.9155744 0.9856462
[15] 0.9404792

$`3Dplot`

Warning messages:
1: `arrange_()` is deprecated as of dplyr 0.7.0.
Please use `arrange()` instead.
See vignette('programming') for more help
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
2: `line.width` does not currently support multiple values. 
3: `line.width` does not currently support multiple values. 

OutlierDetection documentation built on June 16, 2019, 1:03 a.m.