Description Usage Arguments Details Value References See Also Examples
Detects outliers based on a probability model.
1 | HDoutliers(data, maxrows=10000, radius=NULL, alpha=0.05, transform=TRUE)
|
data |
A vector, matrix, or data frame consisting of numeric and/or categorical variables. |
maxrows |
If the number of observations is greater than |
radius |
Threshold for determining membership in the exemplars's lists
(used only when the number of observations is greater than maxrows).
An observation is added to an exemplars' lists if its distance
to that exemplar is less than |
alpha |
Threshold for determining the cutoff for outliers. Observations are considered outliers outliers if they fall in the (1- alpha) tail of the distribution of the nearest-neighbor distances between exemplars. |
transform |
A logical variable indicating whether or not the data needs to be
transformed to conform to Wilkinson's specifications before outlier
detection. The default is to transform the data using function
|
Wilkinson replaces categorical variables with the leading component from
correspondence analysis, and maps the data to the unit square. This is
done as a preprocessing step if transform = TRUE
(the default).
If the number of observations exceeds maxrows
,
the data is first partitioned into lists associated with exemplars
and their members within radius
of each exemplar,
to reduce the number of nearest-neighbor computations required for
outlier detection.
An exponential distribution is then fitted to the upper tail of the
nearest-neighbor distances between exemplars.
Observations are considered
outliers if they fall in the (1- alpha) tail of the fitted CDF.
The indexes of the observations determined to be outliers.
Wilkinson, L. (2016). Visualizing Outliers.
getHDmembers
,
getHDoutliers
,
dataTrans
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | data(dots)
out.W <- HDoutliers(dots$W)
## Not run:
plotHDoutliers(dots$W,out.W)
## End(Not run)
data(ex2D)
out.ex2D <- HDoutliers(ex2D)
## Not run:
plotHDoutliers(ex2D,out.ex2D)
## End(Not run)
## Not run:
n <- 100000 # number of observations
set.seed(3)
x <- matrix(rnorm(2*n),n,2)
nout <- 10 # number of outliers
x[sample(1:n,size=nout),] <- 10*runif(2*nout,min=-1,max=1)
out.x <- HDoutliers(x)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.