Tests used for outlier detection often rely on tabulated critical values. Decisions to reject or not reject a potential outlier can be dependent on the source of the critical values.
Where ever possible, we tried to omit using fixed tables and rather implemented critical value calculation ab initio. However, to be transparent, we detail the mathematical approach for each test in the following.
Note! Several test functions were implemented based on the functions in
the R package outliers
by Lukasz Komsta. Modifications were made to allow testing
up to n=100 laboratories (was restricted to n=30 in package outliers
).
The critical value is calculated internally using formula
eCerto:::qcochran
eCerto:::qcochran
where parameters p, n and k define the desired alpha level, the number of replicates
per group and the number of groups respectively. stats::qf()
is the quantile
function of the f-distribution as implemented in the R stats
package. Likewise,
stats::pf()
is used to compute the Cochran P-value via function
eCerto:::pcochran
eCerto:::pcochran
For comparison with critical values tabulated elsewhere you can generate a respective
table in R.
Show code
ps <- c(0.01, 0.05) ns <- c(3:9, seq(10,20,5)) ks <- c(2:14, seq(15,30,5)) out <- lapply(ps, function(p) { sapply(ns, function(n) { sapply(ks, function(k) { eCerto:::qcochran(p=p, n=n, k=k) }) }) }) names(out) <- paste0("alpha=", ps) lapply(out, function(x) { colnames(x) <- paste0("n=", ns) rownames(x) <- paste0("k=", ks) round(x, 4) })
Show Cochran critical values table
ps <- c(0.01, 0.05) ns <- c(3:9, seq(10,20,5)) ks <- c(2:14, seq(15,30,5)) out <- lapply(ps, function(p) { sapply(ns, function(n) { sapply(ks, function(k) { eCerto:::qcochran(p=p, n=n, k=k) }) }) }) names(out) <- paste0("alpha=", ps) lapply(out, function(x) { colnames(x) <- paste0("n=", ns) rownames(x) <- paste0("k=", ks) round(x, 4) })
The test is originally based on:
"Snedecor, G.W., Cochran, W.G. (1980). Statistical Methods (seventh edition). Iowa State University Press, Ames, Iowa".
The Dixon test statistic is calculated depending on the number of labs (n) using
the function
eCerto:::dixon.test
eCerto:::dixon.test
In eCerto the one-sided version of the Dixon test is applied consecutively
at the lower and upper end. Critical values are stored in tabulated form internally
(eCerto::cvals_Dixon
) and have been combined in a single table from
this source.
Show Dixon critical values table
eCerto::cvals_Dixon[,1:8]
Values for non-tabulated combinations of p and n are derived by interpolation
using function
eCerto:::qtable
eCerto:::qtable
The test is originally based on:
"Dixon, W.J. (1950). Analysis of extreme values. Ann. Math. Stat. 21, 4, 488-506"
"Dean, R.B.; Dixon, W.J. (1951). Simplified statistics for small numbers of observations. Anal.Chem. 23, 636-638"
"Dixon, W.J. (1953). Processing data for outliers. J. Biometrics. 9, 74-89"
The Grubbs test statistic is calculated either for a single or a double outlier at
both ends of the distribution consecutively using
eCerto:::grubbs.test
eCerto:::grubbs.test
Critical values for the single Grubbs test are calculated internally using
eCerto:::qgrubbs
eCerto:::qgrubbs
For comparison with critical values tabulated elsewhere you can generate a respective
table in R.
Show code
ps <- c(0.01, 0.025, 0.05, 0.1) ns <- c(3:10, seq(20,100,40)) out <- sapply(ps, function(p) { sapply(ns, function(n) { eCerto:::qgrubbs(p=p, n=n) }) }) colnames(out) <- paste0("a=", ps) rownames(out) <- paste0("n=", ns) round(out, 4)
Show Grubbs (single) critical values table
ps <- c(0.01, 0.025, 0.05, 0.1) ns <- c(3:10, seq(20,100,40)) out <- sapply(ps, function(p) { sapply(ns, function(n) { eCerto:::qgrubbs(p=p, n=n) }) }) colnames(out) <- paste0("a=", ps) rownames(out) <- paste0("n=", ns) round(out, 4)
Critical values for the double Grubbs test are stored in tabulated form internally
(eCerto::cvals_Grubbs2
) and P-values are derived by comparison.
Show Grubbs (double) critical values table
ps <- c(0.01, 0.025, 0.05, 0.1) ns <- c(4:10, seq(20,100,40)) out <- eCerto::cvals_Grubbs2 round(out[rownames(out)%in%ns, colnames(out)%in%ps], 4)
Note! Critical values for double Grubbs are obtained from the outliers
package
for 4≤n≤30 and are estimated for 31≤n≤100 based on the approximation described in
this paper.
The test is originally based on:
"Grubbs, F.E. (1950). Sample Criteria for testing outlying observations. Ann. Math. Stat. 21, 1, 27-58".
The Scheffe multiple comparison test is a re-implementation of the respective function
published in the agricolae
package. It is calculated on the result of stats::lm()
.
eCerto:::scheffe.test
eCerto:::scheffe.test
The test is originally based on:
"Steel, R.; Torri, J.; Dickey, D. (1997) Principles and Procedures of Statistics: A Biometrical Approach. pp189"
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.