bindr: Tools for Binding Raw Times-Series Hosted on Remote Site Into a Synchronized Factor Data Set

Documented in diagnostic_ur

#' Performs unit root tests (by period) on the difference between \emph{actual}
#' inflation and \emph{expected} inflation derived from econometric models.
#'
#' The function \code{diagnostic_ur} perform unit root tests to validate three
#' econometric models proposed by  \insertCite{Fama;textual}{bindr}. One set of
#' unit root tests is applied to the in-sample forecasting errors, \emph{i.e.}
#' the \strong{difference} between \emph{actual} and \emph{expected} inflation.
#' The second set of unit root tests is applied to \strong{fitted differences},
#' or equivalently, the residuals obtained from regressing \emph{actual}
#' inflation against \emph{expected} inflation, which is derived from an
#' econometric model. Absence of a unit root (\emph{i.e.} rejecting the null
#' hypothesis of a unit root in the augmented Dickey-Fuller test) would suggest
#' a long-term connection between \emph{actual} and \emph{expected} inflation.
#'
#' The measured difference between \emph{actual} and \emph{expected} inflation
#' is captured by the general expression \eqn{I(t) - \alpha - \beta E[I(t-1)]},
#' where \eqn{I(t)} is the \emph{actual} inflation and \eqn{E[I(t-1)]} stands
#' for the \emph{expected} inflation derived from three different econometric
#' models found in \insertCite{Fama;textual}{bindr}. The first set of unit root
#' tests (\code{y_plot = 'difference'}) concerns the in-sample forecasting
#' errors, \emph{i.e.} the \strong{difference} between \emph{actual} and
#' \emph{expected} inflation, and can be viewed as imposing the dual constraint
#' \eqn{\alpha = 0, \beta = 1}. The second set of unit root tests (\code{y_plot
#' = 'fitted difference'}) is applied on the residuals obtained from regressing
#' \emph{actual} inflation against a constant and the \emph{expected} inflation.
#' The residuals in this test can be viewed as a \strong{fitted difference}
#' \eqn{I(t) - \alpha - \beta E[I(t-1)]}, where \eqn{\alpha} and \eqn{\beta} are
#' estimated.
#'
#' Unit root tests are motivated by the findings of
#' \insertCite{Granger;textual}{bindr} who demonstrated how \strong{spurious
#' regression results} emerge when time series display dependence at very long
#' range, including the most extreme case of Brownian process (\emph{i.e. unit
#' root}) where dependence persists at any horizon. Unreliable results arise as
#' the uncertainty about the level of an economic series grows larger
#' indefinitely as one forecasts further into the future. Thus for a series
#' containing a unit root, the \emph{long-run} variance is not a meaningful concept
#' and the assumptions underpinning the standard regression inference are no
#' longer valid. For instance, \insertCite{Fama;textual}{bindr} (\emph{cf.}
#' Table 2, p.334) contain regression results of \emph{actual} inflation against
#' \emph{expected} inflation derived from three econometric models which merit
#' further examination in the current context. Indeed,
#' \insertCite{Mishkin;textual}{bindr} found that inflation is characterized by
#' a unit root, even after adjusting for a finite sample effect and possible
#' test misspecification as documented in
#' \insertCite{Schwert:1987,Schwert:1989;textual}{bindr}. Independent unit root
#' tests on expected inflation performed while developing this package yielded
#' similar conclusions.
#'
#' Despite this statistical quandary, some simple adjustments can be made in
#' specific contexts (\emph{e.g.} \insertCite{Hjalmarsson;textual}{bindr} for
#' overlapping data) to allow for meaningful inference in a regression framework
#' - \emph{but this is not the case in this particular context}. The other
#' avenue is to consider a measure of association which applies in the long-run,
#' such as co-integration. Tests for co-integration involve estimating a
#' regression of \emph{actual} inflation against \emph{expected} inflation using
#' ordinary least squares and then conducting unit root tests on the residuals.
#' In other words, the co-integration of \eqn{I(t)} and \eqn{E[I(t-1)]} implies
#' that a linear combination of these variables is stationary in the long-run.
#' See \insertCite{Mishkin;textual}{bindr} for an application and the Appendix
#' in \insertCite{Neely;textual}{bindr} showing how co-integration tests are
#' closely related to unit root tests.
#'
#' The model specification \code{adf_type = c("none", "drift", "trend")} and its
#' corresponding critical values for the augmented Dickey-Fuller (\strong{ADF})
#' test are derived in \insertCite{Hamilton;textual}{bindr}, Chap. 17. In the
#' current context, the recommended value is \emph{none} as the test is
#' performed on residuals. See \insertCite{Neely;textual}{bindr}, pp. 640-641.
#' The argument \code{adf_lags} is the input to \eqn{l = int(adf_lags *
#' (T/100)^(0.25))}, where \eqn{l} is the number of lags in the augmented
#' Dickey-Fuller test formulation. Extensive numerical simulations by
#' \insertCite{Schwert:1987,Schwert:1989;textual}{bindr} show that ADF tests
#' with \code{adf_lags = 12} on the regression \emph{t} test (as opposed to a
#' test on \eqn{T(\rho - 1)}) are the most robust to misspecification from
#' MA-type disturbances, even in sample size as small as 25 observations. Hence,
#' every ADF unit root tests are performed on regression \emph{t} test and
#' \code{adf_lags = 12} is the recommended value.
#'
#' Note that the function also returns at least one list-column (\emph{data}).
#' List-columns are used in nested data frames and are stored as a column vector
#' of a data frame. They are part of the \emph{tidyverse} framework, and
#' specifically documented in the package \pkg{tidyr}. List-columns are
#' typically created by calling \code{group_by(...)} on a tibble/data frame
#' object followed by \code{nest()}. See \pkg{tidyr} package documentation
#' and/or \pkg{purrr} cheat sheet for details.
#'
#' @param metrics The '\emph{metrics}' 'tibble' object in the list returned by
#'   \code{inflation(operation = 'metrics', ...)}.
#' @param diag_type A string controlling the time intervals: \emph{period}
#'   defines non-overlapping period of \emph{wnd_sz} months in length,
#'   whereas \emph{rolling} generates rolling windows of width \emph{wnd_sz}.
#' @param wnd_sz A integer, strictly greater than 0, defining period length or
#'   rolling windows width. The unit should match the underlying series
#'   frequency. See details.
#' @param show_plot Logical (default is FALSE). Plot the results if set to TRUE
#' @param y_plot A string identifying which measure will be tested (and
#'   displayed if \code{show_plot = TRUE}). See details.
#' @param truncate_at A double, between 0 and 1.0 (including boundary points).
#'   Applicable only when \code{diag_type == 'period', show_plot == TRUE}.
#'   Controls the minimum proportion of \emph{wnd_sz} required to include the
#'   last period in the plot of results.
#' @param adf_conf_level A string, indicating the confidence level for the
#'   augmented Dickey-Fuller unit root test.
#' @param adf_lags A integer, between 0 and 12 (inclusively) indicating the
#'   number of lags applicable in the augmented Dickey-Fuller unit root test.
#'   Default value is 12. See details.
#' @param adf_type A string, indicating the alternative model specification in
#'   the augmented Dickey-Fuller unit root test. Default value is \emph{none}.
#'   See details.
#'
#'
#'
#' @return \itemize{
#' \item \code{diag_type == 'period'}: A 'tibble' object containing
#'   \itemize{\item Model type (Treasury Bills, Naive, Time-series). See
#'                  \insertCite{Fama;textual}{bindr}
#'            \item End date of rolling window or sub-period label
#'            \item List-column of tibble object (\emph{data}) similar to
#'                  \code{inflation(operation = 'metrics')}. See documentation.
#'            \item List-column (\emph{adf} or \emph{adf_diff}), which can be
#'                  safely ignored.
#'            \item Test name (\emph{e.g.} ADF)
#'            \item Test model (no constant, drift or time trend).
#'            \item Test lag is the input to determine the number of lags
#'                  included in the ADF. See details above.
#'            \item The \emph{t}-statistic for the test of the null hypothesis
#'                  that the underlying process has a unit root
#'            \item Test critical value
#'            \item Test confidence level
#'
#'            }
#' \item \code{diag_type == 'rolling'}: A 'tibble' object containing"
#'   \itemize{\item Model type (Treasury Bills, Naive, Time-series). See
#'                  \insertCite{Fama;textual}{bindr}
#'            \item List-column of tibble object (\emph{data}) similar to
#'                  \code{inflation(operation = 'metrics')}. See documentation.
#'            \item End date of rolling window or sub-period label
#'            \item Test name (\emph{e.g.} ADF)
#'            \item Test model (no constant, drift or time trend).
#'            \item Test lag is the input to determine the number of lags
#'                  included in the ADF. See details above.
#'            \item The \emph{t}-statistic for the test of the null hypothesis
#'                  that the underlying process has a unit root
#'            \item Test critical value
#'            \item Test confidence level
#'            }
#'}
#'
#' @references{\insertAllCited{}}
#'
# If dplyr::select is in a package, using .data also prevents R CMD check from
# giving a NOTE about undefined global variables (provided that @importFrom
# rlang .data is inserted). Using rlang::.data (without @importFrom rlang .data)
# is another option. See Wickham, Hadley, R Packages, O'Reilly, 1st Edition,
# 2015, p. 89 for details
#'
#' @importFrom rlang .data
#' @importFrom magrittr "%>%"
#' @export
diagnostic_ur <- function(metrics, diag_type = c('period', 'rolling'),
                          wnd_sz = 60, show_plot = F,
                          y_plot = c('difference', 'fitted difference'),
                          truncate_at = 0.75,
                          adf_conf_level = c('5pct', '1pct', '10pct'),
                          adf_lags = 12,
                          adf_type = c('none', 'drift', 'trend')) {

  adf_conf_level <- match.arg(arg = adf_conf_level)
  adf_type <- match.arg(arg = adf_type)
  diag_type <- match.arg(arg = diag_type)
  yplot <- match.arg(y_plot)

  stopifnot(adf_lags %in% seq(from = 12, to = 0, by = -1))
  stopifnot( wnd_sz > 0 )
  stopifnot(truncate_at >= 0.0 & truncate_at <= 1.0)

  metrics <- tidyr::drop_na(data = metrics)

  # ----------------------------------------------------------------------------
  # Functions to extract selected elements from an object of class ur.df
  # returned by the augmented Dickey-Fuller test. These functions are
  # 'rollified' when diag_type is set to 'rolling' and taken as is when
  # diag_type is set to 'period'.
  adf_diff <- function(shock){
    N <- length(shock)
    lags <- as.integer(adf_lags * ((N/100)^0.25) )
    adf_ <- urca::ur.df(y = shock, type = adf_type,
                        lags = lags, selectlags = 'Fixed')

    tbl <- tibble::tibble(test.name = adf_@test.name, test.model = adf_@model,
                          test.lag = adf_@lags,
                          test.stat = adf_@teststat[1,1],
                          test.cval = adf_@cval[1, adf_conf_level],
                          test.cl = adf_conf_level)
    return(tbl)
  }

  adf_fitted_diff <- function(I_t, expected){
    N <- length(I_t)
    lags <- as.integer(adf_lags * ((N/100)^0.25) )
    ols <- stats::lm(formula = I_t ~ expected)
    adf_ <- urca::ur.df(y = ols$residuals, type = adf_type,
                        lags = lags, selectlags = 'Fixed')

    tbl <- tibble::tibble(test.name = adf_@test.name, test.model = adf_@model,
                          test.lag = adf_@lags,
                          test.stat = adf_@teststat[1,1],
                          test.cval = adf_@cval[1, adf_conf_level],
                          test.cl = adf_conf_level)
    return(tbl)
  }

  # ---------------------------------------------------------------------------
  # Diagnostic by period

  if( diag_type == 'period' ) {

    # Period construction and validation
    if( 'year_month' %in% names(metrics) == F ) {
      stop("Variable 'year_month' is missing",call. = T)
    }
    stopifnot( tsibble::is_yearmonth(metrics$year_month) == T )

    metrics <- with_sub_Period(obs = metrics,
                               interval_type = 'n_month', wnd_sz = wnd_sz)

    nm <- c('model', 'sub_period')
    if( !all( nm %in% names(metrics) ) ){
      msg <- stringr::str_glue(
        'Metrics object has missing variables.\n',
        "'model' and 'sub_period' must be in the tibble.")
      stop(msg, call. = )
    }

    # Retain only subp-eriods with at least [truncate_at * wnd_sz] observations
    check <- dplyr::group_by(.data = metrics,
                             .data$sub_period, .data$model) %>%
      dplyr::summarise(count = dplyr::n())
    rm_Hdl <- dplyr::filter(
      .data = check, .data$count < as.integer(truncate_at * wnd_sz))
    rm_Hdl <- as.character(base::unique(rm_Hdl$sub_period))
    diagnostic <- dplyr::filter(.data = metrics,
                                !(.data$sub_period %in% rm_Hdl))

    # Compile unit root test
    diagnostic <- dplyr::group_by(.data = diagnostic,
                                  .data$sub_period, .data$model) %>%
      tidyr::nest() %>%
      dplyr::filter(.data$sub_period != 'Other') %>%
      dplyr::mutate(adf_diff =
                      purrr::map(.x = .data$data,
                                 ~ adf_diff(shock = .x$shock))) %>%
      dplyr::mutate(adf_fitted_diff =
                      purrr::map(.x = .data$data,
                                 ~ adf_fitted_diff(I_t = .x$I_t,
                                                   expected = .x$expected)))

    if(y_plot == 'difference'){
      diagnostic <- tidyr::unnest(data = diagnostic,
                                  cols = c(.data$adf_diff))
    } else {
      diagnostic <- tidyr::unnest(data = diagnostic,
                                  cols = c(.data$adf_fitted_diff))
    }

    if(show_plot == T){
      plot_difference_ur(diag_obj = diagnostic,
                         which_ = y_plot, type = diag_type, wnd = wnd_sz)
    }
  }

  # ---------------------------------------------------------------------------
  # Diagnostic by rolling window

  if( diag_type == 'rolling' ){

    # Conversion to tibbletime object and validation
    tt <- to_tibble_time(tbl = metrics)

    nm <- c('model')
    if( !all( nm %in% names(metrics) ) ){
      msg <- stringr::str_glue(
        'Metrics object has missing variables.\n',
        "'model' and 'sub_period' must be in the tibble.")
      stop(msg, call. = )
    }

    metrics_tt <- to_tibble_time(tbl = metrics)

    # Define rolling functions
    roll_date <- tibbletime::rollify(~ dplyr::last( .x))

    roll_adf_diff <- tibbletime::rollify(.f = function(shock){
      adf_diff(shock = shock)
    }, window = wnd_sz, unlist = F)

    roll_adf_fitted_diff <- tibbletime::rollify( function(I_t, expected){
     adf_fitted_diff(I_t = I_t, expected = expected)
    }, window = wnd_sz, unlist = F)

    # Compile unit root test
    if(y_plot == 'difference') {
      diagnostic <- dplyr::group_by(.data = metrics_tt, .data$model) %>%
        tidyr::nest(data = -.data$model) %>%
        dplyr::mutate(date = purrr::map(.x = .data$data,
                                        ~ roll_date(.x$date)),
                      adf_diff = purrr::map(
                        .x = .data$data, ~ roll_adf_diff(.x$shock))) %>%
        tidyr::unnest(cols = c(.data$date, .data$adf_diff)) %>%
        dplyr::mutate_at(.vars = 'date', .funs = lubridate::as_date) %>%
        tidyr::unnest(cols = c(.data$adf_diff)) %>%
        dplyr::select(-.data$adf_diff)
    } else {
      diagnostic <- dplyr::group_by(.data = metrics_tt, .data$model) %>%
        tidyr::nest(data = -.data$model) %>%
        dplyr::mutate(date = purrr::map(.x = .data$data,
                                        ~ roll_date(.x$date)),
                      adf_fitted_diff = purrr::map(
                        .x = .data$data,
                        ~ roll_adf_fitted_diff(.x$I_t, .x$expected))) %>%
        tidyr::unnest(cols = c(.data$date, .data$adf_fitted_diff)) %>%
        dplyr::mutate_at(.vars = 'date', .funs = lubridate::as_date) %>%
        tidyr::unnest(cols = c(.data$adf_fitted_diff)) %>%
        dplyr::select(-.data$adf_fitted_diff)
    }

    if(show_plot == T){
      plot_difference_ur(diag_obj = diagnostic,
                         which_ = y_plot, type = diag_type, wnd = wnd_sz)
    }
  }
  return(diagnostic)
}
fognyc/bindr documentation built on Dec. 4, 2020, 12:33 p.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
fognyc/bindr
Tools for Binding Raw Times-Series Hosted on Remote Site Into a Synchronized Factor Data Set

R/diagnostic_ur.R
In fognyc/bindr: Tools for Binding Raw Times-Series Hosted on Remote Site Into a Synchronized Factor Data Set

Defines functions diagnostic_ur

Documented in diagnostic_ur

R Package Documentation

Browse R Packages

We want your feedback!

fognyc/bindr Tools for Binding Raw Times-Series Hosted on Remote Site Into a Synchronized Factor Data Set

R/diagnostic_ur.R In fognyc/bindr: Tools for Binding Raw Times-Series Hosted on Remote Site Into a Synchronized Factor Data Set

Defines functions diagnostic_ur

Documented in diagnostic_ur

R Package Documentation

Browse R Packages

We want your feedback!

fognyc/bindr
Tools for Binding Raw Times-Series Hosted on Remote Site Into a Synchronized Factor Data Set

R/diagnostic_ur.R
In fognyc/bindr: Tools for Binding Raw Times-Series Hosted on Remote Site Into a Synchronized Factor Data Set