huberize  R Documentation 
Huberization (named after Peter Huber's Mestimation algorithm for
location originally) replaces outlying values in a sample x
by
their respective boundary: when x_j < c_1 it is replaced by c_1
and when x_j > c_2 it is replaced by c_2. Consequently,
values inside the interval [c_1, c_2] remain unchanged.
Here, c1,c2 = M +/ c*s where s := s(x) is
the robust scale estimate Qn(x)
if that is positive,
and by default, M is the robust huber estimate of location
μ (with tuning constant k).
In the degenerate case where Qn(x) == 0
, trimmed means of
abs(x  M)
are tried as scale estimate s, with decreasing
trimming proportions specified by the decreasing trim
vector.
huberize(x, M = huberM(x, k = k)$mu, c = k, trim = (5:1)/16, k = 1.5, warn0 = getOption("verbose"), saveTrim = TRUE)
x 
numeric vector which is to be huberized. 
M 
a number; defaulting to 
c 
a positive number, the tuning constant for huberization of the
sample 
trim 
a decreasing vector of trimming proportions in
[0, 0.5], only used to trim the absolute deviations from 
k 
used if 
warn0 

saveTrim 
a 
In regular cases, s = Qn(x)
is positive and used to
huberize values of x
outside [M  c*s, M + c*s]
.
In degenerate cases where Qn(x) == 0
, we search for
an s > 0 by trying the trimmed mean s := mean(abs(xM), trim =
trim[j])
with less and less trimming (as the trimming
proportions trim[]
must decrease).
If even the last, trim[length(trim)]
, leads to s = 0, a
warning is printed when warn0
is true.
a numeric vector as x
; in case Qn(x)
was zero and
saveTrim
is true, also containing the (last) trim
proportion used (to compute the scale s) as attribute "trim"
(see attr()
, attributes
).
For the use in mc()
and similar cases where mainly numerical
stabilization is necessary, a large c = 1e12
will lead to no
huberization, i.e., all y == x
for y < huberize(x, c)
for typical nondegenerate samples.
Martin Maechler
huberM
and mc
which is now stabilized by
default via something like huberize(*, c=1e11)
.
## For nondegenerate data and large c, nothing is huberized, ## as there are *no* really extreme outliers : set.seed(101) x < rnorm(1000) stopifnot(all.equal(x, huberize(x, c=100))) ## OTOH, the "extremes" are shrunken towards the boundaries for smaller c: xh < huberize(x, c = 2) table(x != xh) ## 45 out of a 1000: table(xh[x != xh])# 26 on the left boundary 2.098 and 19 on the right = 2.081 ## vizualization: stripchart(x); text(0,1, "x {original}", pos=3); yh < 0.9 stripchart(xh, at = yh, add=TRUE, col=2) text(0, yh, "huberize(x, c=2)", col=2, pos=1) arrows( x[x!=xh], 1, xh[x!=xh], yh, length=1/8, col=adjustcolor("pink", 1/2))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.