depthout: Outlier detection using depth based method

Description Usage Arguments Details Value Author(s) References Examples

Description

Takes a dataset and finds its outliers using depth-based method

Usage

1
depthout(x, rnames = FALSE, cutoff = 0.05, boottimes = 100)

Arguments

x

dataset for which outliers are to be found

rnames

Logical value indicating whether the dataset has rownames, default value is False

cutoff

Percentile threshold used for depth, default value is 0.05

boottimes

Number of bootsrap samples to find the cutoff, default is 100 samples

Details

depthout computes depth of an observation using depthTools package and based on the bootstrapped cutoff, label an observation as outlier. Outlierliness of the labelled 'Outlier' is also reported and it is the bootstrap estimate of probability of the observation being an outlier. For bivariate data, it also shows the scatterplot of the data with labelled outliers.

Value

Outlier Observations: A matrix of outlier observations

Location of Outlier: Vector of Sr. no. of outliers

Outlier probability: Vector of proportion of times an outlier exceeds local bootstrap cutoff

Author(s)

Vinay Tiwari, Akanksha Kashikar

References

Johnson, T., Kwok, I., and Ng, R.T. 1998. Fast computation of 2-dimensional depth contours. In Proc. Int. Conf. on Knowledge Discovery and Data Mining (KDD), New York, NY. Kno

Examples

1
2
3
4
#Create dataset
X=iris[,1:4]
#Outlier detection
depthout(X,cutoff=0.05)

Example output

Warning messages:
1: In rgl.init(initValue, onlyNULL) : RGL: unable to open X11 display
2: 'rgl.init' failed, running with 'rgl.useNULL = TRUE'. 
$`Outlier Observations`
    Sepal.Length Sepal.Width Petal.Length Petal.Width
110          7.2         3.6          6.1         2.5
118          7.7         3.8          6.7         2.2
119          7.7         2.6          6.9         2.3
132          7.9         3.8          6.4         2.0

$`Location of Outlier`
[1] 110 118 119 132

$`Outlier Probability`
[1] 1 1 1 1

$`3Dplot`

Warning messages:
1: `arrange_()` is deprecated as of dplyr 0.7.0.
Please use `arrange()` instead.
See vignette('programming') for more help
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
2: `line.width` does not currently support multiple values. 
3: `line.width` does not currently support multiple values. 

OutlierDetection documentation built on June 16, 2019, 1:03 a.m.