# MVCH: The MVCH Algorithm In restlos: Robust Estimation of Location and Scatter

## Description

The function determines the multivariate mode by iteratively selecting minimum volume convex subsets.

## Usage

 `1` ```MVCH(data, ps=0.75, pf=0.2, k=1000, a.poi=2, del.poi=1) ```

## Arguments

 `data` At least a two-dimensional data matrix is required. Number of observations needs to be greater than number of dimensions. `ps` A numeric value between 0 and 1. Fraction of points to be retained in each iteration. Default is set to `0.7`. See Details for more information. `pf` A numeric value between 0 and 1. Fraction of points determining the size of the final subset. Default is set to `0.2`. See Details for more information. `k` The maximum number of iterations. Default is set to `1000`. See Details for more information. `a.poi` An integer a.poi ≥ 1. Number of points added when searching for minimum volume subsets. Default is set to `2`. See Details for more information. `del.poi` An integer 1 ≤ del.poi < a.poi. Number of points deleted when searching for minimum volume subsets. Default is set to `1`. See Details for more information.

## Details

The algorithm iteratively determines a sequence of subsets of certain size with minimum convex hull volume (i.e. minimum volume subsets) until a certain threshold is reached. In the first iteration a minimum volume subset of size n_1=floor(n*ps) is sought. In the second iteration, out of the subset found in iteration 1, a subset of size n_2=floor(n_1*ps) is determined. The procedure continues until the threshold is reached: ceil(n*pf) where `n` is the number of observations in `data`. The mode is calculated as the arithmetic mean of the observations in the final subset. Hence, the combination of `ps` and `pf` determines the running time and robustness of the procedure. Highest robustness (in terms of maximum breakdown point) is achieved for ps=floor((n+d+1)/2). Small values of `pf` guarantee an accurate mode estimation also for asymmetric data sets but running times increase.

To find a minimum volume subset, in each iteration `in.subs` atomic subsets (consisting of `d+1` observations) are constructed. Each of these atomic subsets is iteratively expanded by adding the `a.poi` closest points and deleting `del.poi`. All three values determine the accuracy of the subset identification (and, hence, the estimate) as well as the running time of the algorithm. Small values of `in.subs` reduce running time. Choosing similar values for `a.poi` and `del.poi` increases running time and algorithm accuracy.

For more details on the algorithm see the reference.

## Value

A list with following entries:

 `mode` The mode estimate. `set` The final subset used for mode calculation. `vol` The convex hull volume of the final subset. `set.1` The subset identified after the first iteration (outlier-free subset).

## Author(s)

Thomas Kirschstein <thomas.kirschstein@wiwi.uni-halle.de>

## References

Kirschstein, T., Liebscher, S., Porzio, G., Ragozini, G. (2015): Minimum volume peeling: a robust non-parametric estimator of the multivariate mode, Computational Statistics and Data Analysis, DOI: 10.1016/j.csda.2015.04.012.

## Examples

 ```1 2 3 4 5 6 7 8``` ```# maximum breakdown point estimation # MVCH(halle, ps = floor((nrow(halle) + ncol(halle) + 1)/2), pf = 0.05) # slower estimation # MVCH(halle, ps = 0.75, pf = 0.05) # quicker estimation # MVCH(halle, ps = 0.25, pf = 0.05) ```

restlos documentation built on May 2, 2019, 2:45 p.m.