ctbi.outlier | R Documentation |
Please cite the following companion paper if you're using the ctbi
package: Ritter, F.: Technical note: A procedure to clean, decompose, and aggregate time series, Hydrol. Earth Syst. Sci., 27, 349–361, https://doi.org/10.5194/hess-27-349-2023, 2023.
Outliers in an univariate dataset y
are flagged using an enhanced box plot rule (called Logbox, input: coeff.outlier
) that is adapted to non-Gaussian data and keeps the type I error at \frac{0.1}{√{n}} % (percentage of erroneously flagged outliers).
The box plot rule flags data points as outliers if they are below L or above U using the sample quantile q:
L = q(0.25)-α \times (q(0.75)- q(0.25))
U = q(0.75)+α \times (q(0.75)- q(0.25))
Logbox replaces the original α = 1.5 constant of the box plot rule with α = A \times \log(n)+B+\frac{C}{n}. The variable n ≥q 9 is the sample size, C = 36 corrects biases emerging in small samples, and A and B are automatically calculated on a predictor of the maximum tail weight defined as m_{*} = \max(m_{-},m_{+})-0.6165.
The two functions (m_{-},m_{+}) are defined as:
m_{-} = \frac{q(0.875)- q(0.625)}{q(0.75)- q(0.25)}
m_{+} = \frac{q(0.375)- q(0.125)}{q(0.75)- q(0.25)}
And finally, A = f_{A}(m_{*}) and B = f_{B}(m_{*}) with m_{*} restricted to [0,2]. The functions (f_{A},f_{B}) are defined as:
f_{A}(x) = 0.2294\exp(2.9416x-0.0512x^{2}-0.0684x^{3})
f_{B}(x) = 1.0585+15.6960x-17.3618x^{2}+28.3511x^{3}-11.4726x^{4}
Both functions have been calibrated on the Generalized Extreme Value and Pearson families.
ctbi.outlier(y, coeff.outlier = "auto")
y |
univariate data (numeric vector) |
coeff.outlier |
one of |
A list that contains:
xy, a two columns data frame that contains the clean data (first column) and the outliers (second column)
summary.outlier, a vector that contains A, B, C, m_{*}, the size of the residuals (n), and the lower and upper outlier threshold
x <- runif(30) x[c(5,10,20)] <- c(-10,15,30) example1 <- ctbi.outlier(x)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.