Nothing
#' Simulate a (Augmented) Dickey-Fuller Test to Assess Unit Root in a Time Series
#'
#' @description \code{sadf_test()} provides a simulation approach to assessing
#' unit root in a time series by way of the (Augmented) Dickey-Fuller test. It
#' takes a vector and performs three (Augmented) Dickey-Fuller tests (no drift,
#' no trend; drift, no trend; drift and trend) and calculates tau statistics as
#' one normally would. Rather than interpolate or approximate a *p*-value, it
#' simulates some user-specified number of (Augmented) Dickey-Fuller tests of
#' either a known, non-stationary time series or a known, white-noise time series
#' matching the length of the time series the user provides. This allows the
#' user to make assessments of non-stationarity or stationarity by way of
#' simulation rather than approximation from received critical values by way of
#' books or tables some years out of date.
#'
#' @details
#'
#' The Dickey-Fuller and its "augmented" corollary are curious statistical
#' procedures, even if the underlying concept is straightforward. I have seen
#' various implementations of these procedures use slightly different
#' terminology to describe its procedure, though this particular implementation
#' will impose nomenclature in which the classic Dickey-Fuller procedure that
#' assumes just the AR(1) process is one in which `n_lags` is 0. The
#' addition of lags (of first differences) is what ultimately makes the
#' Dickey-Fuller procedure to be "augmented."
#'
#' The function employs the default suggested by Schwert (1989) for the number
#' of lagged first differences to include in this procedure. Schwert (1989)
#' recommends taking the length of the series and dividing it by 100 before
#' raising that number to the power of 1/4. Thereafter, multiply it by 12 and
#' round down the number to the nearest integer. There are other suggested
#' defaults you can consider. \code{adf.test} in \pkg{aTSA} takes the length of
#' the series, divides it by 100 and raises it to the power of 2/9. It
#' multiplies that by 4 and floors the result. \code{adf.test} in \pkg{tseries}
#' subtracts 1 from the length of the series before raising it to the power of
#' 1/3 (flooring that result as well). The Examples section will show you how
#' you can do this.
#'
#' This function specifies three different types of tests: 1) no drift, no trend,
#' 2) drift, no trend, and 3) drift and trend. In the language of the `lm()`
#' function, the first is `lm(y ~ ly - 1)` where `y` is the value of `y` and
#' `ly` is its first-order lag. The second test is `lm(y ~ ly)`, intuitively
#' suggesting the *y*-intercept in this equation is the "drift". The third would
#' be `lm(y ~ ly + t)` with `t` being a simple integer that increases by 1 for
#' each observation (i.e. a time-trend).
#'
#' None of this is meant to discourage the use of Fuller (1976) or its various
#' reproductions for the sake of diagnosing stationarity or non-stationary, and
#' I will confess their expertise on these matters outpaces mine. Consider the
#' justification for this function to be largely philosophical and/or
#' experimental. Why not simulate it? It's not like time or computing power are
#' huge issues anymore.
#'
#' This is always awkwardly stated, but it's a good reminder that the classic
#' Dickey-Fuller statistics are mostly intended to come back negative. That's
#' not always the case, to be clear, but it is the intended case. You assess the
#' statistic by "how negative" it is. Stationary time series will produce test
#' statistics more negative ("smaller") than those produced by non-stationary
#' time series. In a way, this makes the hypotheses implicitly one-tailed (to
#' use that language).
#'
#' This function removes missing values from the vector before calculating test
#' statistics.
#'
#' @return \code{sadf_test()} returns a list of length 3. The first element
#' in the list is a matrix of tau statistics calculated by the test. The second
#' element is a data frame of the simulated tau statistics of either a known
#' white-noise time series or three different non-stationary time series
#' (pure random walk, random walk with drift, random walk with drift and trend).
#' The third element contains some attributes about the procedure for
#' post-processing.
#'
#' @author Steven V. Miller
#'
#' @param x a vector
#' @param n_lags defaults to NULL, but must be 0 or a positive integer. This
#' argument determines the number of lagged first differences to include in the
#' estimation procedure. Recall that the test statistic (tau) is still the
#' t-statistic for the *level* value of the vector at t-1, whether the constant
#' (drift) and time trend is included or not. If this value is 0, the procedure
#' is the classic Dickey-Fuller test. If this value is greater than 0, this is
#' the "augmented" Dickey-Fuller test, so-called because it is "augmented" by
#' the number of lagged first differences to assess higher-order AR processes.
#' If no argument is specified, the default lag is Schwert's suggested lower
#' bound. The \code{lag_suggests} data provides more information about these
#' suggested lags.
#' @param n_sims the number of simulations for calculating an interval or
#' distribution of test statistics for assessing stationarity or
#' non-stationarity. Defaults to 1,000.
#' @param sim_hyp can be either "stationary" or "nonstationary". If
#' "stationary", the function runs (A)DF tests on simulated stationary
#' (pure white noise) data. This allows the user to assess
#' compatibility/plausibility of the test statistic against a distribution of
#' test statistics that are known to be pure white noise (in expectation). If
#' "nonstationary" (default), the function generates three different data sets of
#' a pure random walk, a random walk with a drift, and a random walk with a
#' drift and trend. It then runs (A)DF tests on all those. This allows the user
#' to assess the compatibility/plausibility of their test statistics with data
#' that are known to be nonstationary in some form.
#'
#' @references
#'
#' Schwert, G. William. 1989. "Tests for Unit Roots: A Monte Carlo Investigation."
#' *Journal of Business & Economic Statistics* 7(2): 147--159.
#'
#' @examples
#'
#' y <- na.omit(USDSEK[1:500,])$close # there is one missing value here. n = 499.
#'
#'
#' sadf_test(y, n_sims = 25) # Doing 25, just to make it quick
#'
#'
#' @importFrom stats embed
#' @importFrom stats lm
#' @importFrom stats resid
#' @importFrom stats rnorm
#' @importFrom stats arima.sim
#' @export
#'
sadf_test <- function(x, n_lags = NULL, n_sims = 1000, sim_hyp = "nonstationary") {
if (!is.null(n_lags) && (n_lags %% 1 != 0 || n_lags < 0)) {
stop("n_lags must be a positive integer.")
}
if(!sim_hyp %in% c("stationary", "nonstationary")) {
stop("The 'sim_hyp' argument must be 'stationary' or 'nonstationary'.")
}
x <- x[is.finite(x)] # force the time series to be complete.
n_lags <- ifelse(is.null(n_lags),
floor(4*(length(x)/100)^(.25)), # gonna go with Schwert's lower bound (1989)
n_lags)
# Let's try it again at the top...
# This calculates the DF/ADF stat on the actual data. ----
nlp1 <- n_lags + 1
diff_x <- diff(x)
n_diff_x <- length(diff_x)
m <- embed(diff_x, nlp1)
d_x_t <- m[,1]
l1_x <- x[nlp1:n_diff_x]
time <- 1:length(d_x_t)
if(n_lags > 0) {
adf_diff_lags <- m[, 2:nlp1]
M1 <- lm(d_x_t ~ l1_x - 1 + adf_diff_lags) # no drift, no trend
M2 <- lm(d_x_t ~ l1_x + adf_diff_lags) # drift, no trend
M3 <- lm(d_x_t ~ l1_x + time + adf_diff_lags) # drift and trend
} else { # classic DF, where n_lags == 0
M1 <- lm(d_x_t ~ l1_x - 1) # no drift, no trend
M2 <- lm(d_x_t ~ l1_x ) # drift, no trend
M3 <- lm(d_x_t ~ l1_x + time) # drift and trend
}
Stats <- rbind(summary(M1)$coefficients[1, 3],
summary(M2)$coefficients[2, 3 ],
summary(M3)$coefficients[2, 3 ])
# Start the Sims -----
Sims <- data.frame()
if(sim_hyp == "stationary") { # * if sim_hyp is stationary... ----
for (i in 1:n_sims) {
if(n_lags > 0) {
fM1 <- sim_df_mod(x, ts_type = "ndnt", df_lags = n_lags, wn = TRUE)
fM2 <- sim_df_mod(x, ts_type = "dnt", df_lags = n_lags, wn = TRUE)
fM3 <- sim_df_mod(x, ts_type = "dt", df_lags = n_lags, wn = TRUE)
} else {
fM1 <- sim_df_mod(x, ts_type = "ndnt", classic_df = TRUE, wn = TRUE)
fM2 <- sim_df_mod(x, ts_type = "dnt", classic_df = TRUE, wn = TRUE)
fM3 <- sim_df_mod(x, ts_type = "dt", classic_df = TRUE, wn = TRUE)
}
fakeStats <- rbind(summary(fM1)$coefficients[1, 3],
summary(fM2)$coefficients[2, 3 ],
summary(fM3)$coefficients[2, 3 ])
fakeStats <- data.frame(fakeStats)
names(fakeStats) <- c("tau")
fakeStats$sim <- i
fakeStats$cat <- c("No Drift, No Trend", "Drift, No Trend",
"Drift and Trend")
Sims <- rbind(Sims, fakeStats)
}
} else { # else sim_hyp is non-stationary ----
for (i in 1:n_sims) {
if(n_lags > 0) {
fM1 <- sim_df_mod(x, ts_type = "ndnt", df_lags = n_lags, wn = FALSE)
fM2 <- sim_df_mod(x, ts_type = "dnt", df_lags = n_lags, wn = FALSE)
fM3 <- sim_df_mod(x, ts_type = "dt", df_lags = n_lags, wn = FALSE)
} else {
fM1 <- sim_df_mod(x, ts_type = "ndnt", classic_df = TRUE, wn = FALSE)
fM2 <- sim_df_mod(x, ts_type = "dnt", classic_df = TRUE, wn = FALSE)
fM3 <- sim_df_mod(x, ts_type = "dt", classic_df = TRUE, wn = FALSE)
}
fakeStats <- rbind(summary(fM1)$coefficients[1, 3],
summary(fM2)$coefficients[2, 3 ],
summary(fM3)$coefficients[2, 3 ])
fakeStats <- data.frame(fakeStats)
names(fakeStats) <- c("tau")
fakeStats$sim <- i
fakeStats$cat <- c("No Drift, No Trend", "Drift, No Trend",
"Drift and Trend")
Sims <- rbind(Sims, fakeStats)
}
}
attatt <- data.frame(lags = n_lags,
sim_hyp = sim_hyp,
n_sims = n_sims,
n = length(x),
test = "adf")
output <- list("stats" = Stats,
"sims" = Sims,
"attributes" = attatt)
class(output) <- c("sadf_test")
return(output)
} # end function sadf_test()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.