boxB | R Documentation |
Identifies univariate outliers by using methods based on BoxPlots
boxB(x, k=1.5, method='asymmetric', weights=NULL, id=NULL, exclude=NA, logt=FALSE)
x |
Numeric vector that will be searched for outliers. |
k |
Nonnegative constant that determines the extension of the 'whiskers'. Commonly used values are 1.5 (default), 2, or 3.
Note that when |
method |
Character, identifies the method to be used: |
weights |
Optional numeric vector with units' weights associated to the observations in |
id |
Optional vector with identifiers of units in |
exclude |
Values of |
logt |
Logical, if |
When method="resistant"
the outlying observations are those outside the interval:
[Q1 - k*IQR; Q3 + k*IQR]
where Q1 and Q3 are respectively the 1st and the 3rd quartile of x
, while IQR=(Q3-Q1) is the Inter-Quartile Range. The value k=1.5 (said 'inner fences') is commonly used when drawing a boxplot. Values k=2 and k=3 provide middle and outer fences, respectively.
When method="asymmetric"
the outlying observations are those outside the interval:
[Q1 - 2k*(Q2-Q1); Q3 + 2k*(Q3-Q2)]
being Q2 the median; such a modification allows to account for slight skewness of the distribution.
Finally, when method="adjbox"
the outlying observations are identified using the method proposed by Hubert and Vandervieren (2008) and based on the Medcouple measure of skewness; in practice the bounds are:
[Q1 - 1.5*exp(a*M)*IQR; Q3 + 1.5*exp(b*M)*IQR]
Where M is the medcouple; when M>0 (positive skewness) then a=-4 and b=3; on the contrary a = -3 and b = 4 for negative skewness (M < 0). This adjustment of the boxplot, according to Hubert and Vandervieren (2008), works with moderate skewness (-0.6 <= M <= 0.6). The bounds of the adjusted boxplot are derived by applying the function adjboxStats
in the package robustbase.
When weights are available (passed via the argument weights
) then they are used in the computation of the quartiles. In particular, the quartiles are derived using the function wtd.quantile
in the package Hmisc.
Remember that when asking a log transformation (argument logt=TRUE
) all the estimates (quartiles, etc.) will refer to log(x+1).
The output is a list containing the following components:
quartiles |
The quartiles of |
fences |
The bounds of the interval, values outside the interval are detected as outliers. |
excluded |
The identifiers or positions (when |
outliers |
The identifiers or positions (when |
lowOutl |
The identifiers or positions (when |
upOutl |
The identifiers or positions (when |
Marcello D'Orazio mdo.statmatch@gmail.com
McGill, R., Tukey, J. W. and Larsen, W. A. (1978) ‘Variations of box plots’. The American Statistician, 32, pp. 12-16.
Hubert, M., and Vandervieren, E. (2008) ‘An Adjusted Boxplot for Skewed Distributions’, Computational Statistics and Data Analysis, 52, pp. 5186-5201.
adjboxStats
, wtd.quantile
set.seed(321) x <- rnorm(30, 50, 10) x[10] <- 1 x[20] <- 100 out <- boxB(x = x, k = 1.5, method = 'asymmetric') out$fences out$outliers x[out$outliers] out <- boxB(x = x, k = 1.5, method = 'adjbox') out$fences out$outliers x[out$outliers] x[24] <- NA x.ids <- paste0('obs',1:30) out <- boxB(x = x, k = 1.5, method = 'adjbox', id = x.ids) out$excluded out$fences out$outliers set.seed(111) w <- round(runif(n = 30, min=1, max=10)) out <- boxB(x = x, k = 1.5, method = 'adjbox', id = x.ids, weights = w) out$excluded out$fences out$outliers
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.