BACON: BACON for Regression or Multivariate Covariance Estimation

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Description

BACON, short for ‘Blocked Adaptive Computationally-Efficient Outlier Nominators’, is a somewhat robust algorithm (set), with an implementation for regression or multivariate covariance estimation.

BACON() applies the multivariate (covariance estimation) algorithm, using mvBACON(x) in any case, and when y is not NULL adds a regression iteration phase, using the auxiliary .lmBACON() function.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
BACON(x, y = NULL, intercept = TRUE,
      m = min(collect * p, n * 0.5),
      init.sel = c("Mahalanobis", "dUniMedian", "random", "manual"),
      man.sel, init.fraction = 0, collect = 4,
      alpha = 0.95, maxsteps = 100, verbose = TRUE)

## *Auxiliary* function:
.lmBACON(x, y, intercept = TRUE,
         init.dis, init.fraction = 0, collect = 4,
         alpha = 0.95, maxsteps = 100, verbose = TRUE)

Arguments

x

a multivariate matrix of dimension [n x p] considered as containing no missing values.

y

the response (n vector) in the case of regression, or NULL for the multivariate case, where just mvBACON() is returned.

intercept

logical indicating if an intercept has to be used for the regression.

m

integer in 1:n specifying the size of the initial basic subset; used only when init.sel is not "manual"; see mvBACON.

init.sel

character string, specifying the initial selection mode; see mvBACON.

man.sel

only when init.sel == "manual", the indices of observations determining the initial basic subset (and m <- length(man.sel)).

init.dis

the distances of the x matrix used for the initial subset determined by mvBACON.

init.fraction

if this parameter is > 0 then the tedious steps of selecting the initial subset are skipped and an initial subset of size n * init.fraction is chosen (with smallest dis)

collect

numeric factor chosen by the user to define the size of the initial subset (p * collect)

alpha

significance level.

maxsteps

the maximal number of iteration steps (to prevent infinite loops)

verbose

logical indicating if messages are printed which trace progress of the algorithm.

Details

Notably about the initial selection mode, init.sel, see its description in the mvBACON arguments list.

Value

BACON(x,y,..) (for regression) returns a list with components

subset

the observation indices (in 1:n) denoting a subset of “good” supposedly outlier-free observations.

tis

the t[i](y[m],X[m]) of eq (6) in the reference; the clean “basic subset” in the algorithm is defined the observations i with the smallest |t[i]|, and the t[i] can be regarded as scaled predicted errors.

mv.dis

the (final) discrepancies or distances of mvBACON().

mv.subset

the “good” subset from mvBACON(), used to start the regression iterations.

Note

“BACON” was also chosen in honor of Francis Bacon:

Whoever knows the ways of Nature will more easily notice her deviations; and, on the other hand, whoever knows her deviations will more accurately describe her ways.
Francis Bacon (1620), Novum Organum II 29.

Author(s)

Ueli Oetliker, Swiss Federal Statistical Office, for S-plus 5.1; 25.05.2001; modified six times till 17.6.2001.

Port to R, testing etc, by Martin Maechler. Daniel Weeks (at pitt.edu) proposed a fix to a long standing buglet in GiveTis() computing the t[i], which was further improved Maechler, for robustX version 1.2-3 (Feb. 2019).

References

Billor, N., Hadi, A. S., and Velleman , P. F. (2000). BACON: Blocked Adaptive Computationally-Efficient Outlier Nominators; Computational Statistics and Data Analysis 34, 279–298. doi: 10.1016/S0167-9473(99)00101-2

See Also

mvBACON, the multivariate version of the BACON algorithm.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
data(starsCYG, package = "robustbase")
## Plot simple data and fitted lines
plot(starsCYG)
lmST <- lm(log.light ~ log.Te, data = starsCYG)
abline(lmST, col = "gray") # least squares line
str(B.ST <- with(starsCYG,  BACON(x = log.Te, y = log.light)))
## 'subset': A good set of of points (to determine regression):
colB <- adjustcolor(2, 1/2)
points(log.light ~ log.Te, data = starsCYG, subset = B.ST$subset,
       pch = 19, cex = 1.5, col = colB)
## A BACON-derived line:
lmB <- lm(log.light ~ log.Te, data = starsCYG, subset = B.ST$subset)
abline(lmB, col = colB, lwd = 2)

require(robustbase)
(RlmST <- lmrob(log.light ~ log.Te, data = starsCYG))
abline(RlmST, col = "blue")

Example output

rank(ordered.x[1:m,] >= p  ==> chosen m =  4 
MV-BACON (subset no. 1): 4 of 47 (8.51 %)
MV-BACON (subset no. 2): 5 of 47 (10.64 %)
MV-BACON (subset no. 3): 5 of 47 (10.64 %)
Reg-BACON (init subset no. 0): 8 of 47 (17.02 %)
Reg-BACON (init subset no. 0): 3 of 47 (6.38 %)
Reg-BACON (init subset no. 1): 4 of 47 (8.51 %)
Reg-BACON (init subset no. 2): 5 of 47 (10.64 %)
Reg-BACON (init subset no. 3): 6 of 47 (12.77 %)
Reg-BACON (init subset no. 4): 7 of 47 (14.89 %)
Reg-BACON (init subset no. 5): 8 of 47 (17.02 %)
Reg-BACON (subset no. 1): 8 of 47 (17.02 %)
List of 5
 $ subset   : logi [1:47] FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ tis      : num [1:47] 7.2 3.39 8.26 3.39 11.38 ...
 $ mv.subset: logi [1:47] FALSE FALSE FALSE FALSE TRUE FALSE ...
 $ mv.dis   : num [1:47] 17.44 59.93 7.16 59.93 1.79 ...
 $ steps    : Named int [1:2] 3 1
  ..- attr(*, "names")= chr [1:2] "mv" "lm"
Loading required package: robustbase

Call:
lmrob(formula = log.light ~ log.Te, data = starsCYG)
 \--> method = "MM"
Coefficients:
(Intercept)       log.Te  
     -4.969        2.253  

robustX documentation built on May 2, 2019, 5:16 p.m.