LocScaleB | R Documentation |
This function identifies outliers in the tails of a distribution by detecting the observations outside the bounds built using a robust estimate of both location and scale parameters.
LocScaleB(x, k=3, method='MAD', weights=NULL, id=NULL, exclude=NA, logt=FALSE, return.dataframe=FALSE)
x |
Numeric vector that will be searched for outliers. |
k |
Nonnegative constant that determines the extension of bounds. Commonly used values are 2, 2.5 and 3 (default). |
method |
character identifying how to estimate the scale of the distribution. Available choices are:
When When Finally, when |
weights |
Optional numeric vector that provides weights associated to observations. Only nonnegative weights are allowed. Note that weights can only be used when |
id |
Optional numeric or character vector, with identifiers of units in |
exclude |
Values of |
logt |
Logical, if |
return.dataframe |
Logical, if |
The intervals are derived by considering the median Q_2 as a robust location estimate while different robust scale estimators are considered:
[Q2 - k*s_L; Q2 + k*s_R]
where s_L and S_R are robust scale estimates.
With most of the methods s_L=s_L with exception of method='dQ'
and method='dD'
where respectively:
s_L = (Q2 - Q1)/0.6745 and s_R = (Q3 - Q2)/0.6745
and
s_L = (P50 - P10)/1.2816 and s_R = (P90 - P50)/1.2816
Note that when method='dQ'
or method='dD'
the function calculates and prints a the Bowley's coefficient of skewness, that uses Q1, Q2 and Q3 (they are replaced by respectively P10, P50 and P90 when method='dD'
).
With method='AdjOut'
the following estimates are considered:
s_L = (Q2 - fL) ands_R = (fR - Q2)
being fR and fL derived starting from the fences of the adjusted boxplot (Hubert and Vandervieren, 2008; see adjboxStats
). In addition the medcouple (mc
) measure of skewness is calculated and printed on the screen.
When weights are available (passed via the argument weights
) then they are used in the computation of the quartiles. In particular, the quartiles are derived using the function wtd.quantile
in the package Hmisc. Note that their use is allowed just with method='IQR'
, method='IDR'
, method='dQ'
, method='dD'
or method='AdjOut'
.
The ‘score’ variable reported in the the data
dataframe when return.dataframe=TRUE
is the standardized score derived as (x - Median)/scale.
A list whose components depend on the return.dataframe
argument. When return.dataframe = FALSE
just the following components are provided:
pars |
Vector with estimated median and scale parameters |
bounds |
The bounds of the interval, values outside the interval are considered outliers. |
excluded |
The position or identifiers of |
outliers |
The position or identifiers of |
lowOutl |
The identifiers or positions (when |
upOutl |
The identifiers or positions (when |
When return.dataframe=TRUE
the latter two components are substituted with two dataframes:
excluded |
A dataframe with the subset of observations excluded. |
data |
A dataframe with the the not excluded observations and the following columns: ‘id’ (units' identifiers), ‘x’, ‘log.x’ (only if |
Marcello D'Orazio mdo.statmatch@gmail.com
Hubert, M. and Van der Veeken, S. (2008) ‘Outlier Detection for Skewed Data’. Journal of Chemometrics, 22, pp. 235-246.
Maronna, R.A. and Zamar, R.H. (2002) ‘Robust estimates of location and dispersion of high-dimensional datasets’ Technometrics, 44, pp. 307-317.
Rousseeuw, P.J. and Croux, C. (1993) ‘Alternatives to the Median Absolute Deviation’, Journal of the American Statistical Association 88, pp. 1273-1283.
Vanderviere, E. and Huber, M. (2008) ‘An Adjusted Boxplot for Skewed Distributions’, Computational Statistics & Data Analysis, 52, pp. 5186-5201
mad
, scaleTau2
, Qn
, Sn
, GiniMd
set.seed(333) x <- rnorm(30, 50, 1) x[10] <- 1 x[20] <- 100 out <- LocScaleB(x = x, k = 3, method='MAD') out$pars out$bounds out$outliers x[out$outliers] out <- LocScaleB(x = x, k = 3, method='MAD', return.dataframe = TRUE) head(out$data) out <- LocScaleB(x = x, k = 3, method='AdjOut') out$outliers
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.