wBACON | R Documentation |
wBACON
is an iterative method for the computation of multivariate
location and scatter (under the assumption of a Gaussian distribution).
wBACON(x, weights = NULL, alpha = 0.05, collect = 4, version = c("V2", "V1"),
na.rm = FALSE, maxiter = 50, verbose = FALSE, n_threads = 2)
distance(x)
## S3 method for class 'wbaconmv'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'wbaconmv'
summary(object, ...)
center(object)
## S3 method for class 'wbaconmv'
vcov(object, ...)
x |
|
weights |
|
alpha |
|
collect |
determines the size |
version |
|
na.rm |
|
maxiter |
|
verbose |
|
n_threads |
|
digits |
|
... |
additional arguments passed to the method. |
object |
object of class |
The algorithm is initialized from a set of uncontaminated data. Then the subset is iteratively refined; i.e., additional observations are included into the subset if their Mahalanobis distance is below some threshold (likewise, observations are removed from the subset if their distance larger than the threshold). This process iterates until the set of good data remain stable. Observations not among the good data are outliers; see Billor et al. (2000). The weighted Bacon algorithm is due to Béguin and Hulliger (2008).
The threshold for the (squared) Mahalanobis distances is defined as
the standardized chi-square 1 - \alpha
quantile. All
observations whose squared Mahalanobis distances is larger than
the threshold are regarded as outliers.
If the sampling weights weights
are not explicitly specified (i.e.,
weights = NULL
), they are taken to be 1.0.
The wBACON
cannot deal with missing values. In contrast,
function BEM
in package modi implements
the BACON-EEM algorithm of Béguin and Hulliger (2008), which
is tailored to work with outlying and missing values.
If the argument na.rm
is set to TRUE
the method behaves
like na.omit
.
The BACON algorithm assumes that the non-outlying data have (roughly) an elliptically contoured distribution (this includes the Gaussian distribution as a special case). "Although the algorithms will often do something reasonable even when these assumptions are violated, it is hard to say what the results mean." (Billor et al., 2000, p. 289)
In line with Billor et al. (2000, p. 290), we use the term outlier "nomination" rather than "detection" to highlight that algorithms should not go beyond nominating observations as potential outliers; see also Béguin and Hulliger (2008). It is left to the analyst to finally label outlying observations as such.
Diagnostic plots are available by the plot
method.
The method center
and vcov
return, respectively, the
estimated center/location and covariance matrix.
The distance
method returns the robust Mahalanobis distances.
The function is_outlier returns a vector of logicals that flags the nominated outliers.
An object of class wbaconmv
with slots
x |
see function arguments |
weights |
see function arguments |
center |
estimated center of the data |
dist |
Mahalanobis distances |
n |
number of observations |
p |
number of variables |
alpha |
see function arguments |
subset |
final subset of outlier-free data |
cutoff |
see function arguments |
maxiter |
number of iterations until convergence |
version |
see functions arguments |
collect |
see functions arguments |
cov |
covariance matrix |
converged |
logical that indicates whether the algorithm converged |
call |
the matched call |
Billor N., Hadi A.S. and Vellemann P.F. (2000). BACON: Blocked Adaptive Computationally efficient Outlier Nominators. Computational Statistics and Data Analysis, 34, pp. 279–298. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/S0167-9473(99)00101-2")}
Béguin C. and Hulliger B. (2008). The BACON-EEM Algorithm for Multivariate Outlier Detection in Incomplete Survey Data. Survey Methodology, 34, pp. 91–103. https://www150.statcan.gc.ca/n1/en/catalogue/12-001-X200800110616
Schoch, T. (2021). wbacon: Weighted BACON algorithms for multivariate outlier nomination (detection) and robust linear regression, Journal of Open Source Software, 6 (62), 3238 \Sexpr[results=rd]{tools:::Rd_expr_doi("10.21105/joss.03238")}
plot
and
is_outlier
data(swiss)
dt <- swiss[, c("Fertility", "Agriculture", "Examination", "Education",
"Infant.Mortality")]
m <- wBACON(dt)
m
# indicator vector of potential outliers
is_outlier(m)
# names of the potential outliers
is_outlier(m, names = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.