# huberize: Huberization - Bringing Outliers In In robustbase: Basic Robust Statistics

 huberize R Documentation

## Huberization – Bringing Outliers In

### Description

Huberization (named after Peter Huber's M-estimation algorithm for location originally) replaces outlying values in a sample `x` by their respective boundary: when x_j < c_1 it is replaced by c_1 and when x_j > c_2 it is replaced by c_2. Consequently, values inside the interval [c_1, c_2] remain unchanged.

Here, c1,c2 = M +/- c*s where s := s(x) is the robust scale estimate `Qn(x)` if that is positive, and by default, M is the robust huber estimate of location μ (with tuning constant k).

In the degenerate case where `Qn(x) == 0`, trimmed means of `abs(x - M)` are tried as scale estimate s, with decreasing trimming proportions specified by the decreasing `trim` vector.

### Usage

```huberize(x, M = huberM(x, k = k)\$mu, c = k,
trim = (5:1)/16,
k = 1.5,
warn0 = getOption("verbose"), saveTrim = TRUE)
```

### Arguments

 `x` numeric vector which is to be huberized. `M` a number; defaulting to `huberM(x, k)`, the robust Huber M-estimator of location. `c` a positive number, the tuning constant for huberization of the sample `x`. `trim` a decreasing vector of trimming proportions in [0, 0.5], only used to trim the absolute deviations from `M` in case `Qn(x)` is zero. `k` used if `M` is not specified as huberization center `M`, and so, by default is taken as Huber's M-estimate `huberM(x, k)`. `warn0` `logical` indicating if a warning should be signalled in case `Qn(x)` is zero and the trimmed means for all trimming proportions `trim` are zero as well. `saveTrim` a `logical` indicating if the last tried `trim[j]` value should be stored if `Qn(x)` was zero.

### Details

• In regular cases, `s = Qn(x)` is positive and used to huberize values of `x` outside `[M - c*s, M + c*s]`.

• In degenerate cases where `Qn(x) == 0`, we search for an s > 0 by trying the trimmed mean ```s := mean(abs(x-M), trim = trim[j])``` with less and less trimming (as the trimming proportions `trim[]` must decrease). If even the last, `trim[length(trim)]`, leads to s = 0, a warning is printed when `warn0` is true.

### Value

a numeric vector as `x`; in case `Qn(x)` was zero and `saveTrim` is true, also containing the (last) `trim` proportion used (to compute the scale s) as attribute `"trim"` (see `attr()`, `attributes`).

### Note

For the use in `mc()` and similar cases where mainly numerical stabilization is necessary, a large `c = 1e12` will lead to no huberization, i.e., all `y == x` for `y <- huberize(x, c)` for typical non-degenerate samples.

### Author(s)

Martin Maechler

`huberM` and `mc` which is now stabilized by default via something like `huberize(*, c=1e11)`.

### Examples

```## For non-degenerate data and large c, nothing is huberized,
## as there are *no* really extreme outliers :
set.seed(101)
x <- rnorm(1000)
stopifnot(all.equal(x, huberize(x, c=100)))
## OTOH, the "extremes" are shrunken towards the boundaries for smaller c:
xh <- huberize(x, c = 2)
table(x != xh)
## 45 out of a 1000:
table(xh[x != xh])# 26 on the left boundary -2.098 and 19 on the right = 2.081
## vizualization:
stripchart(x); text(0,1, "x {original}", pos=3); yh <- 0.9
stripchart(xh, at = yh, add=TRUE, col=2)
text(0, yh, "huberize(x, c=2)",   col=2, pos=1)
arrows( x[x!=xh], 1,