# dixon.outliers: Determines outliers using Dixon's Q Test method In referenceIntervals: Reference Intervals

## Description

This determines outliers of the dataset by calculating Dixon's Q statistic and comparing it to a standardized table of statistics. This method can only determine outliers for datasets of size 3 <= n <= 30. This function requires the outliers package.

## Usage

 `1` ```dixon.outliers(data) ```

## Arguments

 `data` A vector of data points.

## Value

Returns a list containing a vector of outliers and a vector of the cleaned data (subset).

 `outliers` A vector of outliers from the data set `subset` A vector containing the remaining data, cleaned of outliers

Daniel Finnegan

## References

Statistical treatment for rejection of deviant values: critical values of Dixon's "Q" parameter and related subrange ratios at the 95 (2), pp 139-146 DOI: 10.1021/ac00002a010. Publication Date: January 1991

One-sided and Two-sided Critical Values for Dixon's Outlier Test for Sample Sizes up to n = 30. Economic Quality Control, Vol 23(2008), No. 1, 5-13.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37``` ```dixon.outliers(set20) summary(dixon.outliers(set20)\$subset) ## The function is currently defined as function (data) { if (length(data) >= 3 & length(data) <= 30) { d = sort(data) gap_high = abs(d[length(data)] - d[length(data) - 1]) gap_low = abs(d[2] - d[1]) range = d[length(d)] - d[1] end = NULL if (gap_high > gap_low) { end = "high" Q = gap_high/range } if (gap_low > gap_high) { end = "low" Q = gap_low/range } dixonNum = subset(dixonTableValues, Size == length(data))\$Q95 sub = data out = as.numeric(c()) if (Q > dixonNum & end == "high") { sub = subset(data, data < d[length(d)]) out = d[length(d)] } if (Q > dixonNum & end == "low") { sub = subset(data, data > d[1]) out = d[1] } return(list(outliers = out, subset = sub)) } else { return(list(outliers = as.numeric(c()), subset = data)) } } ```

### Example output

```Warning message:
no DISPLAY variable so Tk is not available
\$outliers
numeric(0)

\$subset
[1] 36.57288 33.81564 32.43447 45.00967 41.08073 41.70147 43.55727 49.14652
[9] 35.85684 49.95742 39.73881 42.38647 48.66275 40.99573 39.70225 40.24723
[17] 43.84033 41.34600 38.83987 27.83745

Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
27.84   38.27   41.04   40.64   43.63   49.96
function (data)
{
if (length(data) >= 3 & length(data) <= 30) {
d = sort(data)
gap_high = abs(d[length(data)] - d[length(data) - 1])
gap_low = abs(d[2] - d[1])
range = d[length(d)] - d[1]
end = NULL
if (gap_high > gap_low) {
end = "high"
Q = gap_high/range
}
if (gap_low > gap_high) {
end = "low"
Q = gap_low/range
}
dixonNum = subset(dixonTableValues, Size == length(data))\$Q95
sub = data
out = as.numeric(c())
if (Q > dixonNum & end == "high") {
sub = subset(data, data < d[length(d)])
out = d[length(d)]
}
if (Q > dixonNum & end == "low") {
sub = subset(data, data > d[1])
out = d[1]
}
return(list(outliers = out, subset = sub))
}
else {
return(list(outliers = as.numeric(c()), subset = data))
}
}
```

referenceIntervals documentation built on May 30, 2017, 3:08 a.m.