| BEM | R Documentation | 
BEM starts from a set of uncontaminated data with possible
missing values, applies a version of the EM-algorithm to estimate
the center and scatter of the good data, then adds (or deletes)
observations to the good data which have a Mahalanobis distance
below a threshold. This process iterates until the good data remain
stable. Observations not among the good data are outliers.
BEM(
  data,
  weights,
  v = 2,
  c0 = 3,
  alpha = 0.01,
  md.type = "m",
  em.steps.start = 10,
  em.steps.loop = 5,
  better.estimation = FALSE,
  monitor = FALSE
)
| data | a matrix or data frame. As usual, rows are observations and columns are variables. | 
| weights | a non-negative and non-zero vector of weights for each
observation. Its length must equal the number of rows of the data.
Default is  | 
| v | an integer indicating the distance for the definition of the
starting good subset:  | 
| c0 | the size of initial subset is  | 
| alpha | a small probability indicating the level  | 
| md.type | type of Mahalanobis distance:  | 
| em.steps.start | number of iterations of EM-algorithm for starting good subset. | 
| em.steps.loop | number of iterations of EM-algorithm for good subset. | 
| better.estimation | if  | 
| monitor | if  | 
The BACON algorithm with v = 1 is not robust but affine equivariant
while v = 1 is robust but not affine equivariant. The threshold for
the (squared) Mahalanobis distances, beyond which an observation is an
outlier, is a standardised chisquare quantile at (1 - alpha). For
large data sets it may be better to choose alpha / n instead. The
internal function EM.normal is usually called from BEM.
EM.normal is implementing the EM-algorithm in such a way that
part of the calculations can be saved to be reused in the BEM
algorithm. EM.normal does not contain the computation of the
observed sufficient statistics, they will be computed in the main
program of BEM and passed as parameters as well as the statistics
on the missingness patterns.
BEM returns a list whose first component output is a
sublist with the following components:
sample.sizeNumber of observations
discarded.observationsNumber of discarded observations
number.of.variablesNumber of variables
significance.levelThe probability used for the cutpoint,
i.e. alpha
initial.basic.subset.sizeSize of initial good subset
final.basic.subset.sizeSize of final good subset
number.of.iterationsNumber of iterations of the BACON step
computation.timeElapsed computation time
centerFinal estimate of the center
scatterFinal estimate of the covariance matrix
cutpointThe threshold MD-value for the cut-off of outliers
The further components returned by BEM are:
outindIndicator of outliers
distFinal Mahalanobis distances
BEM uses an adapted version of the EM-algorithm in function
.EM-normal.
Beat Hulliger
Béguin, C. and Hulliger, B. (2008) The BACON-EEM Algorithm for Multivariate Outlier Detection in Incomplete Survey Data, Survey Methodology, Vol. 34, No. 1, pp. 91-103.
Billor, N., Hadi, A.S. and Vellemann, P.F. (2000). BACON: Blocked Adaptative Computationally-efficient Outlier Nominators. Computational Statistics and Data Analysis, 34(3), 279-298.
Schafer J.L. (2000), Analysis of Incomplete Multivariate Data, Monographs on Statistics and Applied Probability 72, Chapman & Hall.
# Bushfire data set with 20% MCAR
data(bushfirem, bushfire.weights)
bem.res <- BEM(bushfirem, bushfire.weights,
               alpha = (1 - 0.01 / nrow(bushfirem)))
print(bem.res$output)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.