The MVCH Algorithm

Share:

Description

The function determines the multivariate mode by iteratively selecting minimum volume convex subsets.

Usage

1
MVCH(data, ps=0.75, pf=0.2, k=1000, a.poi=2, del.poi=1)

Arguments

data

At least a two-dimensional data matrix is required. Number of observations needs to be greater than number of dimensions.

ps

A numeric value between 0 and 1. Fraction of points to be retained in each iteration. Default is set to 0.7. See Details for more information.

pf

A numeric value between 0 and 1. Fraction of points determining the size of the final subset. Default is set to 0.2. See Details for more information.

k

The maximum number of iterations. Default is set to 1000. See Details for more information.

a.poi

An integer a.poi ≥ 1. Number of points added when searching for minimum volume subsets. Default is set to 2. See Details for more information.

del.poi

An integer 1 ≤ del.poi < a.poi. Number of points deleted when searching for minimum volume subsets. Default is set to 1. See Details for more information.

Details

The algorithm iteratively determines a sequence of subsets of certain size with minimum convex hull volume (i.e. minimum volume subsets) until a certain threshold is reached. In the first iteration a minimum volume subset of size n_1=floor(n*ps) is sought. In the second iteration, out of the subset found in iteration 1, a subset of size n_2=floor(n_1*ps) is determined. The procedure continues until the threshold is reached: ceil(n*pf) where n is the number of observations in data. The mode is calculated as the arithmetic mean of the observations in the final subset. Hence, the combination of ps and pf determines the running time and robustness of the procedure. Highest robustness (in terms of maximum breakdown point) is achieved for ps=floor((n+d+1)/2). Small values of pf guarantee an accurate mode estimation also for asymmetric data sets but running times increase.

To find a minimum volume subset, in each iteration in.subs atomic subsets (consisting of d+1 observations) are constructed. Each of these atomic subsets is iteratively expanded by adding the a.poi closest points and deleting del.poi. All three values determine the accuracy of the subset identification (and, hence, the estimate) as well as the running time of the algorithm. Small values of in.subs reduce running time. Choosing similar values for a.poi and del.poi increases running time and algorithm accuracy.

For more details on the algorithm see the reference.

Value

A list with following entries:

mode

The mode estimate.

set

The final subset used for mode calculation.

vol

The convex hull volume of the final subset.

set.1

The subset identified after the first iteration (outlier-free subset).

Author(s)

Thomas Kirschstein <thomas.kirschstein@wiwi.uni-halle.de>

References

Kirschstein, T., Liebscher, S., Porzio, G., Ragozini, G. (2015): Minimum volume peeling: a robust non-parametric estimator of the multivariate mode, Computational Statistics and Data Analysis, DOI: 10.1016/j.csda.2015.04.012.

Examples

1
2
3
4
5
6
7
8
# maximum breakdown point estimation
# MVCH(halle, ps = floor((nrow(halle) + ncol(halle) + 1)/2), pf = 0.05)

# slower estimation
# MVCH(halle, ps = 0.75, pf = 0.05)

# quicker estimation
# MVCH(halle, ps = 0.25, pf = 0.05)