getHDmembers: Partitioning Stage of the _hdoutliers_ Algorithm

Description Usage Arguments Details Value References See Also Examples

Description

Implements the first stage of the hdoutliers Algorithm, in which the data is partitioned according to exemplars and their associated lists of members.

Usage

1
getHDmembers(data, maxrows = 10000, radius = NULL) 

Arguments

data

A vector, matrix, or data frame consisting of numeric and/or categorical variables.

maxrows

If the number of observations is greater than maxrows, HDoutliers reduces the number used in nearest-neighbor computations to a set of exemplars. The default value is 10000.

radius

Threshold for determining membership in the exemplars's lists (used only when the number of observations is greater than maxrows). An observation is added to an exemplars' list if its distance to that exemplar is less than radius. The default value is .1/(log n)^(1/p), where n is the number of observations and p is the dimension of the data.

Details

If the number of observations exceeds maxrows, the data is partitioned into lists corresponding to exemplars and their members within radius of each exemplar, to reduce the number of nearest-neighbor computations required for outlier detection.
When there are fewer observations, the result is a list whose elements are the individual observations (each observation is an exemplar, with no other members).

Value

A list in which each component is a vector of observation indexes. The first index in each list is the index of the exemplar defining that list, and any remaining indexes are the associated members, within radius of the exemplar.

References

Wilkinson, L. (2016). Visualizing Outliers. <https://www.cs.uic.edu/~wilkinson/Publications/outliers.pdf>.

See Also

HDoutliers, getHDoutliers

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
data(dots)
mem.W <- getHDmembers(dots$W)
out.W <- getHDoutliers(dots$W,mem.W)

data(ex2D)
mem.ex2D <- getHDmembers(ex2D)
out.ex2D <- getHDoutliers(ex2D,mem.ex2D)

## Not run: 
n <- 100000 # number of observations
set.seed(3)
x <- matrix(rnorm(2*n),n,2)
nout <- 10 # number of outliers
x[sample(1:n,size=nout),] <- 10*runif(2*nout,min=-1,max=1)

mem.x <- getHDmembers(x)
out.x <- getHDoutliers(x,mem.x)
## End(Not run)

HDoutliers documentation built on Feb. 11, 2022, 5:10 p.m.