outliers | R Documentation |
Standard cutoff-based methods for detecting outliers with price relatives.
quartile_method(x, cu = 2.5, cl = cu, a = 0, type = 7)
resistant_fences(x, cu = 2.5, cl = cu, a = 0, type = 7)
robust_z(x, cu = 2.5, cl = cu)
fixed_cutoff(x, cu = 2.5, cl = 1/cu)
tukey_algorithm(x, cu = 2.5, cl = cu, type = 7)
hb_transform(x)
x |
A strictly positive numeric vector of price relatives. These can be
made with, e.g., |
cu , cl |
A numeric vector, or something that can be coerced into one,
giving the upper and lower cutoffs for each element of |
a |
A numeric vector, or something that can be coerced into one,
between 0 and 1 giving the scale factor for the median to establish the
minimum dispersion between quartiles for each element of |
type |
See |
Each of these functions constructs an interval of the form [b_l(x) -
c_l \times l(x), b_u(x) + c_u \times u(x)]
and assigns a value in x
as TRUE
if that value does not
belong to the interval, FALSE
otherwise. The methods differ in how
they construct the values b_l(x)
, b_u(x)
,
l(x)
, and u(x)
. Any missing values in x
are ignored when
calculating the cutoffs, but will return NA
.
The fixed cutoff method is the simplest, and just uses the interval
[c_l, c_u]
.
The quartile method and Tukey algorithm are described in paragraphs 5.113 to
5.135 of the CPI manual (2020), as well as by Rais (2008) and Hutton (2008).
The resistant fences method is an alternative to the quartile method, and is
described by Rais (2008) and Hutton (2008). Quantile-based methods often
identify price relatives as outliers because the distribution is
concentrated around 1; setting a > 0
puts a floor on the minimum
dispersion between quantiles as a fraction of the median. See the references
for more details.
The robust Z-score is the usual method to identify relatives in the (asymmetric) tails of the distribution, simply replacing the mean with the median, and the standard deviation with the median absolute deviation.
These methods often assume that price relatives are symmetrically
distributed (if not Gaussian). As the distribution of price relatives often
has a long right tail, the natural logarithm can be used to transform price
relative before identifying outliers (sometimes under the assumption that
price relatives are distributed log-normal). The Hidiroglou-Berthelot
transformation is another approach, described in the CPI manual (par.
5.124). (Sometimes the transformed price relatives are multiplied by
\max(p_1, p_0)^u
, for some
0 \le u \le 1
, so that products with a larger price
get flagged as outliers (par. 5.128).)
A logical vector, the same length as x
, that is TRUE
if the
corresponding element of x
is identified as an outlier,
FALSE
otherwise.
Hutton, H. (2008). Dynamic outlier detection in price index surveys. Proceedings of the Survey Methods Section: Statistical Society of Canada Annual Meeting.
IMF, ILO, Eurostat, UNECE, OECD, and World Bank. (2020). Consumer Price Index Manual: Concepts and Methods. International Monetary Fund.
Rais, S. (2008). Outlier detection for the Consumer Price Index. Proceedings of the Survey Methods Section: Statistical Society of Canada Annual Meeting.
grouped()
to make each of these functions operate on grouped data.
back_period()
/base_period()
for a simple utility function to turn prices
in a table into price relatives.
The HBmethod()
function in the univOutl package for the
Hidiroglou-Berthelot method for identifying outliers.
set.seed(1234)
x <- rlnorm(10)
fixed_cutoff(x)
robust_z(x)
quartile_method(x)
resistant_fences(x) # always identifies fewer outliers than above
tukey_algorithm(x)
log(x)
hb_transform(x)
# Works the same for grouped data
f <- c("a", "b", "a", "a", "b", "b", "b", "a", "a", "b")
grouped(quartile_method)(x, group = f)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.