sadf_test: Simulate a (Augmented) Dickey-Fuller Test to Assess Unit Root...

View source: R/sadf_test.R

sadf_testR Documentation

Simulate a (Augmented) Dickey-Fuller Test to Assess Unit Root in a Time Series

Description

sadf_test() provides a simulation approach to assessing unit root in a time series by way of the (Augmented) Dickey-Fuller test. It takes a vector and performs three (Augmented) Dickey-Fuller tests (no drift, no trend; drift, no trend; drift and trend) and calculates tau statistics as one normally would. Rather than interpolate or approximate a p-value, it simulates some user-specified number of (Augmented) Dickey-Fuller tests of either a known, non-stationary time series or a known, white-noise time series matching the length of the time series the user provides. This allows the user to make assessments of non-stationarity or stationarity by way of simulation rather than approximation from received critical values by way of books or tables some years out of date.

Usage

sadf_test(x, n_lags = NULL, n_sims = 1000, sim_hyp = "nonstationary")

Arguments

x

a vector

n_lags

defaults to NULL, but must be 0 or a positive integer. This argument determines the number of lagged first differences to include in the estimation procedure. Recall that the test statistic (tau) is still the t-statistic for the level value of the vector at t-1, whether the constant (drift) and time trend is included or not. If this value is 0, the procedure is the classic Dickey-Fuller test. If this value is greater than 0, this is the "augmented" Dickey-Fuller test, so-called because it is "augmented" by the number of lagged first differences to assess higher-order AR processes. If no argument is specified, the default lag is Schwert's suggested lower bound. The lag_suggests data provides more information about these suggested lags.

n_sims

the number of simulations for calculating an interval or distribution of test statistics for assessing stationarity or non-stationarity. Defaults to 1,000.

sim_hyp

can be either "stationary" or "nonstationary". If "stationary", the function runs (A)DF tests on simulated stationary (pure white noise) data. This allows the user to assess compatibility/plausibility of the test statistic against a distribution of test statistics that are known to be pure white noise (in expectation). If "nonstationary" (default), the function generates three different data sets of a pure random walk, a random walk with a drift, and a random walk with a drift and trend. It then runs (A)DF tests on all those. This allows the user to assess the compatibility/plausibility of their test statistics with data that are known to be nonstationary in some form.

Details

The Dickey-Fuller and its "augmented" corollary are curious statistical procedures, even if the underlying concept is straightforward. I have seen various implementations of these procedures use slightly different terminology to describe its procedure, though this particular implementation will impose nomenclature in which the classic Dickey-Fuller procedure that assumes just the AR(1) process is one in which n_lags is 0. The addition of lags (of first differences) is what ultimately makes the Dickey-Fuller procedure to be "augmented."

The function employs the default suggested by Schwert (1989) for the number of lagged first differences to include in this procedure. Schwert (1989) recommends taking the length of the series and dividing it by 100 before raising that number to the power of 1/4. Thereafter, multiply it by 12 and round down the number to the nearest integer. There are other suggested defaults you can consider. adf.test in aTSA takes the length of the series, divides it by 100 and raises it to the power of 2/9. It multiplies that by 4 and floors the result. adf.test in tseries subtracts 1 from the length of the series before raising it to the power of 1/3 (flooring that result as well). The Examples section will show you how you can do this.

This function specifies three different types of tests: 1) no drift, no trend, 2) drift, no trend, and 3) drift and trend. In the language of the lm() function, the first is lm(y ~ ly - 1) where y is the value of y and ly is its first-order lag. The second test is lm(y ~ ly), intuitively suggesting the y-intercept in this equation is the "drift". The third would be lm(y ~ ly + t) with t being a simple integer that increases by 1 for each observation (i.e. a time-trend).

None of this is meant to discourage the use of Fuller (1976) or its various reproductions for the sake of diagnosing stationarity or non-stationary, and I will confess their expertise on these matters outpaces mine. Consider the justification for this function to be largely philosophical and/or experimental. Why not simulate it? It's not like time or computing power are huge issues anymore.

This is always awkwardly stated, but it's a good reminder that the classic Dickey-Fuller statistics are mostly intended to come back negative. That's not always the case, to be clear, but it is the intended case. You assess the statistic by "how negative" it is. Stationary time series will produce test statistics more negative ("smaller") than those produced by non-stationary time series. In a way, this makes the hypotheses implicitly one-tailed (to use that language).

This function removes missing values from the vector before calculating test statistics.

Value

sadf_test() returns a list of length 3. The first element in the list is a matrix of tau statistics calculated by the test. The second element is a data frame of the simulated tau statistics of either a known white-noise time series or three different non-stationary time series (pure random walk, random walk with drift, random walk with drift and trend). The third element contains some attributes about the procedure for post-processing.

Author(s)

Steven V. Miller

References

Schwert, G. William. 1989. "Tests for Unit Roots: A Monte Carlo Investigation." Journal of Business & Economic Statistics 7(2): 147–159.

Examples


y <- na.omit(USDSEK[1:500,])$close # there is one missing value here. n = 499.


sadf_test(y, n_sims = 25) # Doing 25, just to make it quick



sTSD documentation built on April 3, 2025, 7:37 p.m.