# BEM: BACON-EEM Algorithm for multivariate outlier detection in... In modi: Multivariate outlier detection and imputation for incomplete survey data

## Description

BEM starts from a set of uncontaminated data with possible missing values, applies a version of the EM-algorithm to estimate the center and scatter of the good data, then adds (or deletes) observations to the good data which have a Mahalanobis distance below a threshold. This process iterates until the good data remain stable. Observations not among the good data are outliers.

## Usage

 ```1 2``` ```BEM(data, weights, v = 2, c0 = 3, alpha = 0.01, md.type = "m", em.steps.start = 10, em.steps.loop = 5, better.estimation = FALSE, monitor = FALSE) ```

## Arguments

 `data` a matrix or data frame. As usual, rows are observations and columns are variables. `weights` a non-negative and non-zero vector of weights for each observation. Its length must equal the number of rows of the data. Default is `rep(1,nrow(data))`. `v` an integer indicating the distance for the definition of the starting good subset: v=1 uses the Mahalanobis distance based on the weighted mean and covariance, v=2 uses the Euclidean distance from the componentwise median `c0` the size of initial subset is c0*ncol(data). `alpha` a small probability indicating the level `(1-alpha)` of the cutoff quantile for good observations `md.type` Type of Mahalanobis distance: "m" marginal, "c" conditional `em.steps.start` Number of iterations of EM-algorithm for starting good subset `em.steps.loop` Number of iterations of EM-algorithm for good subset `better.estimation` If `better.estimation=TRUE` then the EM-algorithm for the final good subset iterates `em.steps.start` more. `monitor` If `TRUE` verbose output.

## Details

The BACON algorithm with `v=1` is not robust but affine equivariant while `v=1` is robust but not affine equivariant. The threshold for the (squared) Mahalanobis distances, beyond which an observation is an outlier, is a standardised chisquare quantile at `(1-alpha)`. For large data sets it may be better to choose `alpha/n` instead.

The internal function `.EM.normal` is usually called from `BEM`. `.EM.normal` is implementing the EM-algorithm in such a way that part of the calculations can be saved to be reused in the BEM algorithm. `.EM.normal` does not contain the computation of the observed sufficient statistics, they will be computed in the main program of `BEM` and passed as parameters as well as the statistics on the missingness patterns.

## Value

`BEM` returns a list whose first component is the sub-list `output` with the following components:

 `sample.size ` number of observations `discarded.observations` Number of discarded observations `number.of.variables ` Number of variables `significance.level` the probability used for the cutpoint, i.e.\ `alpha` `initial.basic.subset.size` Size of initial good subset `final.basic.subset.size` Size of final good subset `number.of.iterations` Number of iterations of the BACON step `computation.time` Elapsed computation time `center` Final estimate of the center `scatter` Final estimate of the covariance matrix `cutpoint` The threshold MD-value for the cut-off of outliers

The further components returned by `BEM` are:

 `outind` Outlier indicator `dist` Final Mahalanobis distances

## Note

BEM uses an adapted version of the EM-algorithm in funkction `EM-normal.`

Beat Hulliger

## References

B\'eguin, C. and Hulliger, B. (2008) The BACON-EEM Algorithm for Multivariate Outlier Detection in Incomplete Survey Data, Survey Methodology, Vol. 34, No. 1, pp. 91-103.

Billor, N., Hadi, A.S. and Vellemann, P.F. (2000). BACON: Blocked Adaptative Computationally-efficient Outlier Nominators. Computational Statistics and Data Analysis, 34(3), 279-298.

Schafer J.L. (2000), Analysis of Incomplete Multivariate Data, Monographs on Statistics and Applied Probability 72, Chapman & Hall.

## Examples

 ```1 2 3 4``` ```# Bushfire data set with 20% MCAR data(bushfirem,bushfire.weights) bem.res<-BEM(bushfirem,bushfire.weights,alpha=(1-0.01/nrow(bushfirem))) print(bem.res\$output) ```

modi documentation built on May 31, 2017, 5 a.m.