| boxB | R Documentation |
Identifies univariate outliers by using methods based on BoxPlots
boxB(x, k=1.5, method='asymmetric', weights=NULL, id=NULL,
exclude=NA, logt=FALSE)
x |
Numeric vector that will be searched for outliers. |
k |
Nonnegative constant that determines the extension of the 'whiskers'. Commonly used values are 1.5 (default), 2, or 3.
Note that when |
method |
Character, identifies the method to be used: |
weights |
Optional numeric vector with units' weights associated to the observations in |
id |
Optional vector with identifiers of units in |
exclude |
Values of |
logt |
Logical, if |
When method="resistant" the outlying observations are those outside the interval:
[Q1 - k*IQR; Q3 + k*IQR]
where Q1 and Q3 are respectively the 1st and the 3rd quartile of x, while IQR=(Q3-Q1) is the Inter-Quartile Range. The value k=1.5 (said 'inner fences') is commonly used when drawing a boxplot. Values k=2 and k=3 provide middle and outer fences, respectively.
When method="asymmetric" the outlying observations are those outside the interval:
[Q1 - 2k*(Q2-Q1); Q3 + 2k*(Q3-Q2)]
being Q2 the median; such a modification allows to account for slight skewness of the distribution.
Finally, when method="adjbox" the outlying observations are identified using the method proposed by Hubert and Vandervieren (2008) and based on the Medcouple measure of skewness; in practice the bounds are:
[Q1 - 1.5*exp(a*M)*IQR; Q3 + 1.5*exp(b*M)*IQR]
Where M is the medcouple; when M>0 (positive skewness) then a=-4 and b=3; on the contrary a = -3 and b = 4 for negative skewness (M < 0). This adjustment of the boxplot, according to Hubert and Vandervieren (2008), works with moderate skewness (-0.6 <= M <= 0.6). The bounds of the adjusted boxplot are derived by applying the function adjboxStats in the package robustbase.
When weights are available (passed via the argument weights) then they are used in the computation of the quartiles. In particular, the quartiles are derived using the function wtd.quantile in the package Hmisc.
Remember that when asking a log transformation (argument logt=TRUE) all the estimates (quartiles, etc.) will refer to log(x+1).
The output is a list containing the following components:
quartiles |
The quartiles of |
fences |
The bounds of the interval, values outside the interval are detected as outliers. |
excluded |
The identifiers or positions (when |
outliers |
The identifiers or positions (when |
lowOutl |
The identifiers or positions (when |
upOutl |
The identifiers or positions (when |
Marcello D'Orazio mdo.statmatch@gmail.com
McGill, R., Tukey, J. W. and Larsen, W. A. (1978) ‘Variations of box plots’. The American Statistician, 32, pp. 12-16.
Hubert, M., and Vandervieren, E. (2008) ‘An Adjusted Boxplot for Skewed Distributions’, Computational Statistics and Data Analysis, 52, pp. 5186-5201.
adjboxStats, wtd.quantile
set.seed(321)
x <- rnorm(30, 50, 10)
x[10] <- 1
x[20] <- 100
out <- boxB(x = x, k = 1.5, method = 'asymmetric')
out$fences
out$outliers
x[out$outliers]
out <- boxB(x = x, k = 1.5, method = 'adjbox')
out$fences
out$outliers
x[out$outliers]
x[24] <- NA
x.ids <- paste0('obs',1:30)
out <- boxB(x = x, k = 1.5, method = 'adjbox', id = x.ids)
out$excluded
out$fences
out$outliers
set.seed(111)
w <- round(runif(n = 30, min=1, max=10))
out <- boxB(x = x, k = 1.5, method = 'adjbox', id = x.ids, weights = w)
out$excluded
out$fences
out$outliers
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.