disp: Outlier detection using genralised dispersion

Description Usage Arguments Details Value Author(s) References Examples

Description

Takes a dataset and finds its outliers using dispersion-based method

Usage

1
disp(x, cutoff = 0.95, rnames = FALSE, boottimes = 100)

Arguments

x

dataset for which outliers are to be found

cutoff

Percentile threshold used for distance, default value is 0.95

rnames

Logical value indicating whether the dataset has rownames, default value is False

boottimes

Number of bootsrap samples to find the cutoff, default is 100 samples

Details

disp computes LOO dispersion matrix for each observation(dispersion matrix without cosidering the current observation) and based on the bootstrapped cutoff for score(difference between determinant of LOO dispersion matrix and det of actual dispersion matrix), labels an observation as outlier. Outlierliness of the labelled 'Outlier' is also reported and it is the bootstrap estimate of probability of the observation being an outlier. For bivariate data, it also shows the scatterplot of the data with labelled outliers.

Value

Outlier Observations: A matrix of outlier observations

Location of Outlier: Vector of Sr. no. of outliers

Outlier probability: Vector of proportion of times an outlier exceeds local bootstrap cutoff

Author(s)

Vinay Tiwari, Akanksha Kashikar

References

Jin, W., Tung, A., and Han, J. 2001. Mining top-n local outliers in large databases. In Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD), San Francisco, CA.

Examples

1
2
3
4
#Create dataset
X=iris[,1:4]
#Outlier detection
disp(X,cutoff=0.99)

Example output

Warning messages:
1: In rgl.init(initValue, onlyNULL) : RGL: unable to open X11 display
2: 'rgl.init' failed, running with 'rgl.useNULL = TRUE'. 
$`Outlier Observations`
[1] Sepal.Length Sepal.Width  Petal.Length Petal.Width 
<0 rows> (or 0-length row.names)

$`Location of Outlier`
integer(0)

$`Outlier Probability`
NULL

$`3Dplot`

Warning messages:
1: `arrange_()` is deprecated as of dplyr 0.7.0.
Please use `arrange()` instead.
See vignette('programming') for more help
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
2: `line.width` does not currently support multiple values. 

OutlierDetection documentation built on June 16, 2019, 1:03 a.m.