sadf_test | R Documentation |
sadf_test()
provides a simulation approach to assessing
unit root in a time series by way of the (Augmented) Dickey-Fuller test. It
takes a vector and performs three (Augmented) Dickey-Fuller tests (no drift,
no trend; drift, no trend; drift and trend) and calculates tau statistics as
one normally would. Rather than interpolate or approximate a p-value, it
simulates some user-specified number of (Augmented) Dickey-Fuller tests of
either a known, non-stationary time series or a known, white-noise time series
matching the length of the time series the user provides. This allows the
user to make assessments of non-stationarity or stationarity by way of
simulation rather than approximation from received critical values by way of
books or tables some years out of date.
sadf_test(x, n_lags = NULL, n_sims = 1000, sim_hyp = "nonstationary")
x |
a vector |
n_lags |
defaults to NULL, but must be 0 or a positive integer. This
argument determines the number of lagged first differences to include in the
estimation procedure. Recall that the test statistic (tau) is still the
t-statistic for the level value of the vector at t-1, whether the constant
(drift) and time trend is included or not. If this value is 0, the procedure
is the classic Dickey-Fuller test. If this value is greater than 0, this is
the "augmented" Dickey-Fuller test, so-called because it is "augmented" by
the number of lagged first differences to assess higher-order AR processes.
If no argument is specified, the default lag is Schwert's suggested lower
bound. The |
n_sims |
the number of simulations for calculating an interval or distribution of test statistics for assessing stationarity or non-stationarity. Defaults to 1,000. |
sim_hyp |
can be either "stationary" or "nonstationary". If "stationary", the function runs (A)DF tests on simulated stationary (pure white noise) data. This allows the user to assess compatibility/plausibility of the test statistic against a distribution of test statistics that are known to be pure white noise (in expectation). If "nonstationary" (default), the function generates three different data sets of a pure random walk, a random walk with a drift, and a random walk with a drift and trend. It then runs (A)DF tests on all those. This allows the user to assess the compatibility/plausibility of their test statistics with data that are known to be nonstationary in some form. |
The Dickey-Fuller and its "augmented" corollary are curious statistical
procedures, even if the underlying concept is straightforward. I have seen
various implementations of these procedures use slightly different
terminology to describe its procedure, though this particular implementation
will impose nomenclature in which the classic Dickey-Fuller procedure that
assumes just the AR(1) process is one in which n_lags
is 0. The
addition of lags (of first differences) is what ultimately makes the
Dickey-Fuller procedure to be "augmented."
The function employs the default suggested by Schwert (1989) for the number
of lagged first differences to include in this procedure. Schwert (1989)
recommends taking the length of the series and dividing it by 100 before
raising that number to the power of 1/4. Thereafter, multiply it by 12 and
round down the number to the nearest integer. There are other suggested
defaults you can consider. adf.test
in aTSA takes the length of
the series, divides it by 100 and raises it to the power of 2/9. It
multiplies that by 4 and floors the result. adf.test
in tseries
subtracts 1 from the length of the series before raising it to the power of
1/3 (flooring that result as well). The Examples section will show you how
you can do this.
This function specifies three different types of tests: 1) no drift, no trend,
2) drift, no trend, and 3) drift and trend. In the language of the lm()
function, the first is lm(y ~ ly - 1)
where y
is the value of y
and
ly
is its first-order lag. The second test is lm(y ~ ly)
, intuitively
suggesting the y-intercept in this equation is the "drift". The third would
be lm(y ~ ly + t)
with t
being a simple integer that increases by 1 for
each observation (i.e. a time-trend).
None of this is meant to discourage the use of Fuller (1976) or its various reproductions for the sake of diagnosing stationarity or non-stationary, and I will confess their expertise on these matters outpaces mine. Consider the justification for this function to be largely philosophical and/or experimental. Why not simulate it? It's not like time or computing power are huge issues anymore.
This is always awkwardly stated, but it's a good reminder that the classic Dickey-Fuller statistics are mostly intended to come back negative. That's not always the case, to be clear, but it is the intended case. You assess the statistic by "how negative" it is. Stationary time series will produce test statistics more negative ("smaller") than those produced by non-stationary time series. In a way, this makes the hypotheses implicitly one-tailed (to use that language).
This function removes missing values from the vector before calculating test statistics.
sadf_test()
returns a list of length 3. The first element
in the list is a matrix of tau statistics calculated by the test. The second
element is a data frame of the simulated tau statistics of either a known
white-noise time series or three different non-stationary time series
(pure random walk, random walk with drift, random walk with drift and trend).
The third element contains some attributes about the procedure for
post-processing.
Steven V. Miller
Schwert, G. William. 1989. "Tests for Unit Roots: A Monte Carlo Investigation." Journal of Business & Economic Statistics 7(2): 147–159.
y <- na.omit(USDSEK[1:500,])$close # there is one missing value here. n = 499.
sadf_test(y, n_sims = 25) # Doing 25, just to make it quick
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.