#' Performs unit root tests (by period) on the difference between \emph{actual}
#' inflation and \emph{expected} inflation derived from econometric models.
#'
#' The function \code{diagnostic_ur} perform unit root tests to validate three
#' econometric models proposed by \insertCite{Fama;textual}{bindr}. One set of
#' unit root tests is applied to the in-sample forecasting errors, \emph{i.e.}
#' the \strong{difference} between \emph{actual} and \emph{expected} inflation.
#' The second set of unit root tests is applied to \strong{fitted differences},
#' or equivalently, the residuals obtained from regressing \emph{actual}
#' inflation against \emph{expected} inflation, which is derived from an
#' econometric model. Absence of a unit root (\emph{i.e.} rejecting the null
#' hypothesis of a unit root in the augmented Dickey-Fuller test) would suggest
#' a long-term connection between \emph{actual} and \emph{expected} inflation.
#'
#' The measured difference between \emph{actual} and \emph{expected} inflation
#' is captured by the general expression \eqn{I(t) - \alpha - \beta E[I(t-1)]},
#' where \eqn{I(t)} is the \emph{actual} inflation and \eqn{E[I(t-1)]} stands
#' for the \emph{expected} inflation derived from three different econometric
#' models found in \insertCite{Fama;textual}{bindr}. The first set of unit root
#' tests (\code{y_plot = 'difference'}) concerns the in-sample forecasting
#' errors, \emph{i.e.} the \strong{difference} between \emph{actual} and
#' \emph{expected} inflation, and can be viewed as imposing the dual constraint
#' \eqn{\alpha = 0, \beta = 1}. The second set of unit root tests (\code{y_plot
#' = 'fitted difference'}) is applied on the residuals obtained from regressing
#' \emph{actual} inflation against a constant and the \emph{expected} inflation.
#' The residuals in this test can be viewed as a \strong{fitted difference}
#' \eqn{I(t) - \alpha - \beta E[I(t-1)]}, where \eqn{\alpha} and \eqn{\beta} are
#' estimated.
#'
#' Unit root tests are motivated by the findings of
#' \insertCite{Granger;textual}{bindr} who demonstrated how \strong{spurious
#' regression results} emerge when time series display dependence at very long
#' range, including the most extreme case of Brownian process (\emph{i.e. unit
#' root}) where dependence persists at any horizon. Unreliable results arise as
#' the uncertainty about the level of an economic series grows larger
#' indefinitely as one forecasts further into the future. Thus for a series
#' containing a unit root, the \emph{long-run} variance is not a meaningful concept
#' and the assumptions underpinning the standard regression inference are no
#' longer valid. For instance, \insertCite{Fama;textual}{bindr} (\emph{cf.}
#' Table 2, p.334) contain regression results of \emph{actual} inflation against
#' \emph{expected} inflation derived from three econometric models which merit
#' further examination in the current context. Indeed,
#' \insertCite{Mishkin;textual}{bindr} found that inflation is characterized by
#' a unit root, even after adjusting for a finite sample effect and possible
#' test misspecification as documented in
#' \insertCite{Schwert:1987,Schwert:1989;textual}{bindr}. Independent unit root
#' tests on expected inflation performed while developing this package yielded
#' similar conclusions.
#'
#' Despite this statistical quandary, some simple adjustments can be made in
#' specific contexts (\emph{e.g.} \insertCite{Hjalmarsson;textual}{bindr} for
#' overlapping data) to allow for meaningful inference in a regression framework
#' - \emph{but this is not the case in this particular context}. The other
#' avenue is to consider a measure of association which applies in the long-run,
#' such as co-integration. Tests for co-integration involve estimating a
#' regression of \emph{actual} inflation against \emph{expected} inflation using
#' ordinary least squares and then conducting unit root tests on the residuals.
#' In other words, the co-integration of \eqn{I(t)} and \eqn{E[I(t-1)]} implies
#' that a linear combination of these variables is stationary in the long-run.
#' See \insertCite{Mishkin;textual}{bindr} for an application and the Appendix
#' in \insertCite{Neely;textual}{bindr} showing how co-integration tests are
#' closely related to unit root tests.
#'
#' The model specification \code{adf_type = c("none", "drift", "trend")} and its
#' corresponding critical values for the augmented Dickey-Fuller (\strong{ADF})
#' test are derived in \insertCite{Hamilton;textual}{bindr}, Chap. 17. In the
#' current context, the recommended value is \emph{none} as the test is
#' performed on residuals. See \insertCite{Neely;textual}{bindr}, pp. 640-641.
#' The argument \code{adf_lags} is the input to \eqn{l = int(adf_lags *
#' (T/100)^(0.25))}, where \eqn{l} is the number of lags in the augmented
#' Dickey-Fuller test formulation. Extensive numerical simulations by
#' \insertCite{Schwert:1987,Schwert:1989;textual}{bindr} show that ADF tests
#' with \code{adf_lags = 12} on the regression \emph{t} test (as opposed to a
#' test on \eqn{T(\rho - 1)}) are the most robust to misspecification from
#' MA-type disturbances, even in sample size as small as 25 observations. Hence,
#' every ADF unit root tests are performed on regression \emph{t} test and
#' \code{adf_lags = 12} is the recommended value.
#'
#' Note that the function also returns at least one list-column (\emph{data}).
#' List-columns are used in nested data frames and are stored as a column vector
#' of a data frame. They are part of the \emph{tidyverse} framework, and
#' specifically documented in the package \pkg{tidyr}. List-columns are
#' typically created by calling \code{group_by(...)} on a tibble/data frame
#' object followed by \code{nest()}. See \pkg{tidyr} package documentation
#' and/or \pkg{purrr} cheat sheet for details.
#'
#' @param metrics The '\emph{metrics}' 'tibble' object in the list returned by
#' \code{inflation(operation = 'metrics', ...)}.
#' @param diag_type A string controlling the time intervals: \emph{period}
#' defines non-overlapping period of \emph{wnd_sz} months in length,
#' whereas \emph{rolling} generates rolling windows of width \emph{wnd_sz}.
#' @param wnd_sz A integer, strictly greater than 0, defining period length or
#' rolling windows width. The unit should match the underlying series
#' frequency. See details.
#' @param show_plot Logical (default is FALSE). Plot the results if set to TRUE
#' @param y_plot A string identifying which measure will be tested (and
#' displayed if \code{show_plot = TRUE}). See details.
#' @param truncate_at A double, between 0 and 1.0 (including boundary points).
#' Applicable only when \code{diag_type == 'period', show_plot == TRUE}.
#' Controls the minimum proportion of \emph{wnd_sz} required to include the
#' last period in the plot of results.
#' @param adf_conf_level A string, indicating the confidence level for the
#' augmented Dickey-Fuller unit root test.
#' @param adf_lags A integer, between 0 and 12 (inclusively) indicating the
#' number of lags applicable in the augmented Dickey-Fuller unit root test.
#' Default value is 12. See details.
#' @param adf_type A string, indicating the alternative model specification in
#' the augmented Dickey-Fuller unit root test. Default value is \emph{none}.
#' See details.
#'
#'
#'
#' @return \itemize{
#' \item \code{diag_type == 'period'}: A 'tibble' object containing
#' \itemize{\item Model type (Treasury Bills, Naive, Time-series). See
#' \insertCite{Fama;textual}{bindr}
#' \item End date of rolling window or sub-period label
#' \item List-column of tibble object (\emph{data}) similar to
#' \code{inflation(operation = 'metrics')}. See documentation.
#' \item List-column (\emph{adf} or \emph{adf_diff}), which can be
#' safely ignored.
#' \item Test name (\emph{e.g.} ADF)
#' \item Test model (no constant, drift or time trend).
#' \item Test lag is the input to determine the number of lags
#' included in the ADF. See details above.
#' \item The \emph{t}-statistic for the test of the null hypothesis
#' that the underlying process has a unit root
#' \item Test critical value
#' \item Test confidence level
#'
#' }
#' \item \code{diag_type == 'rolling'}: A 'tibble' object containing"
#' \itemize{\item Model type (Treasury Bills, Naive, Time-series). See
#' \insertCite{Fama;textual}{bindr}
#' \item List-column of tibble object (\emph{data}) similar to
#' \code{inflation(operation = 'metrics')}. See documentation.
#' \item End date of rolling window or sub-period label
#' \item Test name (\emph{e.g.} ADF)
#' \item Test model (no constant, drift or time trend).
#' \item Test lag is the input to determine the number of lags
#' included in the ADF. See details above.
#' \item The \emph{t}-statistic for the test of the null hypothesis
#' that the underlying process has a unit root
#' \item Test critical value
#' \item Test confidence level
#' }
#'}
#'
#' @references{\insertAllCited{}}
#'
# If dplyr::select is in a package, using .data also prevents R CMD check from
# giving a NOTE about undefined global variables (provided that @importFrom
# rlang .data is inserted). Using rlang::.data (without @importFrom rlang .data)
# is another option. See Wickham, Hadley, R Packages, O'Reilly, 1st Edition,
# 2015, p. 89 for details
#'
#' @importFrom rlang .data
#' @importFrom magrittr "%>%"
#' @export
diagnostic_ur <- function(metrics, diag_type = c('period', 'rolling'),
wnd_sz = 60, show_plot = F,
y_plot = c('difference', 'fitted difference'),
truncate_at = 0.75,
adf_conf_level = c('5pct', '1pct', '10pct'),
adf_lags = 12,
adf_type = c('none', 'drift', 'trend')) {
adf_conf_level <- match.arg(arg = adf_conf_level)
adf_type <- match.arg(arg = adf_type)
diag_type <- match.arg(arg = diag_type)
yplot <- match.arg(y_plot)
stopifnot(adf_lags %in% seq(from = 12, to = 0, by = -1))
stopifnot( wnd_sz > 0 )
stopifnot(truncate_at >= 0.0 & truncate_at <= 1.0)
metrics <- tidyr::drop_na(data = metrics)
# ----------------------------------------------------------------------------
# Functions to extract selected elements from an object of class ur.df
# returned by the augmented Dickey-Fuller test. These functions are
# 'rollified' when diag_type is set to 'rolling' and taken as is when
# diag_type is set to 'period'.
adf_diff <- function(shock){
N <- length(shock)
lags <- as.integer(adf_lags * ((N/100)^0.25) )
adf_ <- urca::ur.df(y = shock, type = adf_type,
lags = lags, selectlags = 'Fixed')
tbl <- tibble::tibble(test.name = adf_@test.name, test.model = adf_@model,
test.lag = adf_@lags,
test.stat = adf_@teststat[1,1],
test.cval = adf_@cval[1, adf_conf_level],
test.cl = adf_conf_level)
return(tbl)
}
adf_fitted_diff <- function(I_t, expected){
N <- length(I_t)
lags <- as.integer(adf_lags * ((N/100)^0.25) )
ols <- stats::lm(formula = I_t ~ expected)
adf_ <- urca::ur.df(y = ols$residuals, type = adf_type,
lags = lags, selectlags = 'Fixed')
tbl <- tibble::tibble(test.name = adf_@test.name, test.model = adf_@model,
test.lag = adf_@lags,
test.stat = adf_@teststat[1,1],
test.cval = adf_@cval[1, adf_conf_level],
test.cl = adf_conf_level)
return(tbl)
}
# ---------------------------------------------------------------------------
# Diagnostic by period
if( diag_type == 'period' ) {
# Period construction and validation
if( 'year_month' %in% names(metrics) == F ) {
stop("Variable 'year_month' is missing",call. = T)
}
stopifnot( tsibble::is_yearmonth(metrics$year_month) == T )
metrics <- with_sub_Period(obs = metrics,
interval_type = 'n_month', wnd_sz = wnd_sz)
nm <- c('model', 'sub_period')
if( !all( nm %in% names(metrics) ) ){
msg <- stringr::str_glue(
'Metrics object has missing variables.\n',
"'model' and 'sub_period' must be in the tibble.")
stop(msg, call. = )
}
# Retain only subp-eriods with at least [truncate_at * wnd_sz] observations
check <- dplyr::group_by(.data = metrics,
.data$sub_period, .data$model) %>%
dplyr::summarise(count = dplyr::n())
rm_Hdl <- dplyr::filter(
.data = check, .data$count < as.integer(truncate_at * wnd_sz))
rm_Hdl <- as.character(base::unique(rm_Hdl$sub_period))
diagnostic <- dplyr::filter(.data = metrics,
!(.data$sub_period %in% rm_Hdl))
# Compile unit root test
diagnostic <- dplyr::group_by(.data = diagnostic,
.data$sub_period, .data$model) %>%
tidyr::nest() %>%
dplyr::filter(.data$sub_period != 'Other') %>%
dplyr::mutate(adf_diff =
purrr::map(.x = .data$data,
~ adf_diff(shock = .x$shock))) %>%
dplyr::mutate(adf_fitted_diff =
purrr::map(.x = .data$data,
~ adf_fitted_diff(I_t = .x$I_t,
expected = .x$expected)))
if(y_plot == 'difference'){
diagnostic <- tidyr::unnest(data = diagnostic,
cols = c(.data$adf_diff))
} else {
diagnostic <- tidyr::unnest(data = diagnostic,
cols = c(.data$adf_fitted_diff))
}
if(show_plot == T){
plot_difference_ur(diag_obj = diagnostic,
which_ = y_plot, type = diag_type, wnd = wnd_sz)
}
}
# ---------------------------------------------------------------------------
# Diagnostic by rolling window
if( diag_type == 'rolling' ){
# Conversion to tibbletime object and validation
tt <- to_tibble_time(tbl = metrics)
nm <- c('model')
if( !all( nm %in% names(metrics) ) ){
msg <- stringr::str_glue(
'Metrics object has missing variables.\n',
"'model' and 'sub_period' must be in the tibble.")
stop(msg, call. = )
}
metrics_tt <- to_tibble_time(tbl = metrics)
# Define rolling functions
roll_date <- tibbletime::rollify(~ dplyr::last( .x))
roll_adf_diff <- tibbletime::rollify(.f = function(shock){
adf_diff(shock = shock)
}, window = wnd_sz, unlist = F)
roll_adf_fitted_diff <- tibbletime::rollify( function(I_t, expected){
adf_fitted_diff(I_t = I_t, expected = expected)
}, window = wnd_sz, unlist = F)
# Compile unit root test
if(y_plot == 'difference') {
diagnostic <- dplyr::group_by(.data = metrics_tt, .data$model) %>%
tidyr::nest(data = -.data$model) %>%
dplyr::mutate(date = purrr::map(.x = .data$data,
~ roll_date(.x$date)),
adf_diff = purrr::map(
.x = .data$data, ~ roll_adf_diff(.x$shock))) %>%
tidyr::unnest(cols = c(.data$date, .data$adf_diff)) %>%
dplyr::mutate_at(.vars = 'date', .funs = lubridate::as_date) %>%
tidyr::unnest(cols = c(.data$adf_diff)) %>%
dplyr::select(-.data$adf_diff)
} else {
diagnostic <- dplyr::group_by(.data = metrics_tt, .data$model) %>%
tidyr::nest(data = -.data$model) %>%
dplyr::mutate(date = purrr::map(.x = .data$data,
~ roll_date(.x$date)),
adf_fitted_diff = purrr::map(
.x = .data$data,
~ roll_adf_fitted_diff(.x$I_t, .x$expected))) %>%
tidyr::unnest(cols = c(.data$date, .data$adf_fitted_diff)) %>%
dplyr::mutate_at(.vars = 'date', .funs = lubridate::as_date) %>%
tidyr::unnest(cols = c(.data$adf_fitted_diff)) %>%
dplyr::select(-.data$adf_fitted_diff)
}
if(show_plot == T){
plot_difference_ur(diag_obj = diagnostic,
which_ = y_plot, type = diag_type, wnd = wnd_sz)
}
}
return(diagnostic)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.