The Forward Search algorithm is an iterative algorithm for for multiple (time series) regression suggested by Hadi and Simonoff (1993) and developed further by Atkinson and Riani (2000). The algorithm starts with a robust estimate of the regression parameters and a sub-sample of size m_0 and iterates with a sequence of least squares steps. The asymptotic theory developed by Johansen and Nielsen (2013, 2014) is implemented.
The Forward Search algorithm is an iterative algorithm for for multiple (time series) regression suggested by Hadi and Simonoff (1993) and developed further by Atkinson and Riani (2000). The algorithm starts with a robust estimate of the regression parameters and a sub-sample of size m_0. A common choice for the initial estimator is the Least Trimmed Squares estimator of Rousseeuw (1984).
The algorithm is initiated by computing the absolute residuals for all n observations. The initial sub-sample consists of the observations with the smallest m_0 absolute residuals. We then run a regression on those m_0 observations and compute absolute residuals of all n observations. The observations with m_0+1 smallest residuals are then selected. The m_0+1 smallest residual is the forward residual. A new regression is performed on these m_0+1 observations. This is then iterated. Eventually the least squares estimator based on all n observations is computed.
The algorithm results in a sequence of forward residuals indexed by the sub-sample size m running from m_0 to n-1. The idea is to monitor the plot of these and stop when the forward residuals become "large". Johansen and Nielsen (2013, 2014) have developed, respectively, pointwise and simultaneous confidence bands for estimators and forward residuals. These are implemented in the package.
The ForwardSearch package can be used as follows.
Execute the full Forward Search using
Create the forward plot of the forward residuals using
This requires the output from above and a choice of reference distribution.
The plot shows the scaled forward residuals from above along with simultaneous
The user has to choose a "gauge", which is the expected fraction of falsely
detected outliers that are tolerable when in fact there are no outliers. For
instance a "gauge" of 0.01 indicates that in a sample of n=110 observations
1.1 outlier is found on average when there are none.
The simultaneous confidence bands are calibrated so that the Forward Search
stop when the fitted values exceed the chosen confidence bands the first time.
This is a stopping time.
The theory for this is given in Johansen and Nielsen.
Get the estimates of the stopped Forward Search using
The user has to input the estimated stopping time.
This also gives the rank of the selected and non-selected observations.
These are the "good" and the "bad" observations.
Bent Nielsen <[email protected]> 9 Sep 2014
Atkinson, A.C. and Riani, M. (2000) Robust Diagnostic Regression Analysis. New York: Springer.
Hadi, A.S. and Simonoff, J.S. (1993) Procedures for the Identification of Multiple Outliers in Linear Models Journal of the American Statistical Association 88, 1264-1272.
Johansen, S. and Nielsen, B. (2013) Asymptotic analysis of the Forward Search. Download: Nuffield DP.
Johansen, S. and Nielsen, B. (2014) Outlier detection algorithms for least squares time series. Download: Nuffield DP.
Rousseeuw, P.J. (1984) Least median of squares regression. Journal of the American Statistical Association 79, 871-880.
Forward Search can alternatively be done by the package
forward version 1.0.3 includes functions for the analysis suggested in e.g.
Atkinson and Riani (2000), but does not include the asymptotic theory of
Johansen and Nielsen (2013, 2014).
Matlab code for Forward Search is also available from
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
##################### # EXAMPLE 1 # using Fulton Fish data, # see Johansen and Nielsen (2014). # Call package library(ForwardSearch) # Call data data(Fulton) mdata <- as.matrix(Fulton) n <- nrow(mdata) # Identify variable to reproduce Johansen and Nielsen (2014) q <- mdata[2:n ,9] q_1 <- mdata[1:(n-1) ,9] s <- mdata[2:n ,6] x.q.s <- cbind(q_1,s) colnames(x.q.s ) <- c("q_1","stormy") # Fit Forward Search FS95 <- ForwardSearch.fit(x.q.s,q,psi.0=0.95) FS80 <- ForwardSearch.fit(x.q.s,q,psi.0=0.80) # Forward plot of forward residuals scaled by variance estimate # Note the variance estimate is not bias corrected # This is taken into account in asymptotic theory ForwardSearch.plot(FS95) ForwardSearch.plot(FS80) # Based on the plot of e.g. FS95 it is decided to stop at m=107 ForwardSearch.stopped(FS95,107) # Alternatively use the file inst/extdata/Fulton.txt # Data <- read.table(data/Fulton.txt,header=TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.