R/diagnostic.R

Defines functions diagnostic

Documented in diagnostic

#' Performs multiple diagnostic checks (by period) on the expected inflation
#' rate derived from the CPI series and the modeling of the inflation dynamics.
#'
#' The function \code{diagnostic} perform checks to validate three models
#' proposed by  \insertCite{Fama;textual}{bindr}. One diagnostic set is applied
#' to the in-sample forecasting errors, \emph{i.e.} the \strong{difference}
#' between \emph{actual} and \emph{expected} inflation. The second diagnostic
#' set evaluates the results obtained from \strong{regressing} \emph{actual}
#' inflation against a constant and the \emph{expected} inflation, with the
#' ideal model showing a constant term of zero and an expected inflation
#' coefficient of one.
#'
#' The measured difference between \emph{actual} and \emph{expected} inflation
#' is captured by the general expression \eqn{I(t) - \alpha - \beta E[I(t-1)]},
#' where \eqn{I(t)} is the \emph{actual} inflation and \eqn{E[I(t-1)]} stands
#' for the \emph{expected} inflation derived from three different econometric
#' models found in \insertCite{Fama;textual}{bindr}. The diagnostic set applied
#' to the in-sample forecasting errors, \emph{i.e.} the \strong{difference}
#' between \emph{actual} and \emph{expected} inflation, can be viewed as
#' imposing the dual constraint \eqn{\alpha = 0, \beta = 1}. The second
#' diagnostic set applied on the results obtained from regressing \emph{actual}
#' inflation against a constant and the \emph{expected} inflation can be viewed
#' as a \strong{fitted difference} \eqn{I(t) - \alpha - \beta E[I(t-1)]}, where
#' \eqn{\alpha} and \eqn{\beta} are estimated.
#'
#' The parameter \code{y_plot = c("mean", "test", "sd", "rmse")} pertains to the
#' difference between actual and expected inflation, whereas \code{y_plot =
#' "term"} performs diagnostic checking on results obtained from regressing
#' actual inflation against a constant and the expected inflation. The latter
#' choice requires setting \code{select_ = 'constant'} or \code{select_ =
#' 'slope'}.
#'
#' The \code{wnd_sz} parameter unit must implicitly match the underlying data
#' frequency. For instance, \code{wnd_sz = 60} corresponds to a 5-year
#' window/period in monthly frequency. However, if the underlying frequency is
#' quarterly the same parameter value yields a 15-year window/period.
#'
#' Note that when \code{y_plot == c('mean', 'test', 'sd', 'rmse')}, three
#' list-columns (\emph{stats::lm, tidy_param, tidy_ols}) are also returned.
#' List-columns are used in nested data frames and are stored as a column vector
#' of a data frame. They are part of the \emph{tidyverse} framework, and
#' specifically documented in the package \pkg{tidyr}. List-columns are
#' typically created by calling \code{group_by(...)} on a tibble/data frame
#' object followed by \code{nest()}. See \pkg{tidyr} package documentation
#' and/or \emph{purrr} cheat sheet for details. All three list-columns
#' (\emph{stats::lm, tidy_param, tidy_ols}) contain results associated with
#' regressing actual inflation against a constant and the expected inflation.
#'
#' @param metrics The '\emph{metrics}' 'tibble' object in the list returned by
#'   \code{inflation(operation = 'metrics', ...)}
#' @param diag_type A string controlling the time intervals: \emph{period}
#'   defines non-overlapping period of \emph{wnd_sz} months in length,
#'   whereas \emph{rolling} generates rolling windows of width \emph{wnd_sz}
#' @param wnd_sz A integer, strictly greater than 0, defining period length or
#'   rolling windows width. The unit should match the underlying series
#'   frequency (see details)
#' @param show_plot Logical (default is FALSE). Plot the results if set to TRUE
#' @param y_plot A string identifying which statistic will be calculated (and
#'   displayed if \code{show_plot = TRUE}). See details.
#' @param select_ A string identifying which regression term will be selected.
#'   Applies only if \code{y_plot == 'term'}, otherwise ignored. See details.
#' @param conf_level A double, between 0.5 and 1.0 (excluding boundary points).
#'   Controls the parameter confidence intervals in the regression of
#' \emph{actual} inflation against a constant and the \emph{expected} inflation.
#' @param truncate_at A double, between 0 and 1.0 (including boundary points).
#'   Applicable only when \code{diag_type == 'period', show_plot == TRUE}.
#'   Controls the minimum proportion of \emph{wnd_sz} required to include the
#'   last period in the plot of results .
#'
#' @return \itemize{
#' \item \code{y_plot == 'term'}: A 'tibble' object containing
#'   \itemize{\item model type
#'            \item date or sub-period
#'            \item coefficient estimate
#'            \item coefficient standard error
#'            \item coefficient t-test against Ho (constant = 0; inflation = 1)
#'            \item coefficient p-value
#'            \item coefficient confidence interval
#'            }
#' \item \code{y_plot == c('mean', 'test', 'sd', 'rmse')}: A 'tibble' object
#' containing in-sample forecasting errors statistics:
#'   \itemize{\item model type
#'            \item date or sub-period
#'            \item average of monthly forecast error (mean)
#'            \item the \emph{t}-statistic for the test
#'                  of the null hypothesis that the average forecast error is
#'                  equal to zero (test)
#'            \item the standard deviation of the monthly forecast error (sd)
#'            \item the square root of the average squared forecast error (rsme)
#'            \item list-column of 'stats::lm' objects (see details)
#'            \item list-column of 'tidy_param' objects (see details)
#'            \item list-column of 'tidy_ols' objects (see details)
#'            }
#'}
#
#'
#' @references{\insertAllCited{}}
#'
# If dplyr::select is in a package, using .data also prevents R CMD check from
# giving a NOTE about undefined global variables (provided that @importFrom
# rlang .data is inserted). Using rlang::.data (without @importFrom rlang .data)
# is another option. See Wickham, Hadley, R Packages, O'Reilly, 1st Edition,
# 2015, p. 89 for details
#'
#' @importFrom rlang .data
#' @importFrom magrittr "%>%"
#' @export
diagnostic <- function(metrics, diag_type = c('period', 'rolling'),
                       wnd_sz = 60, show_plot = F,
                       y_plot = c('mean', 'test', 'sd', 'rmse', 'term'),
                       select_ = c('constant', 'slope'),
                       conf_level = 0.95, truncate_at = 0.75) {

  diag_type <- match.arg(arg = diag_type)
  yplot <- match.arg(y_plot)
  select_ <- match.arg(select_)

  stopifnot( wnd_sz > 0 )
  stopifnot(conf_level > 0.5 & conf_level < 1.0)
  stopifnot(truncate_at >= 0.0 & truncate_at <= 1.0)

  # ---------------------------------------------------------------------------
  # Diagnostic by period

  if( diag_type == 'period' ) {

    # Period construction and validation
    if( 'year_month' %in% names(metrics) == F ) {
      stop("Variable 'year_month' is missing",call. = T)
    }
    stopifnot( tsibble::is_yearmonth(metrics$year_month) == T )

    metrics <- with_sub_Period(obs = metrics,
                               interval_type = 'n_month', wnd_sz = wnd_sz)

    nm <- c('model', 'sub_period')
    if( !all( nm %in% names(metrics) ) ){
      msg <- stringr::str_glue(
        'Metrics object has missing variables.\n',
        "'model' and 'sub_period' must be in the tibble.")
      stop(msg, call. = )
    }



    # Compilation of diagnostic measures: mean, t-test, sd, rmse
    check <- dplyr::group_by(.data = metrics,
                              .data$sub_period, .data$model) %>%
      dplyr::summarise(count = dplyr::n())
    rm_Hdl <- dplyr::filter(
      .data = check, .data$count < as.integer(truncate_at * wnd_sz))
    rm_Hdl <- as.character(base::unique(rm_Hdl$sub_period))

    diagnostic <- dplyr::filter(.data = metrics,
                                !(.data$sub_period %in% rm_Hdl)) %>%
      dplyr::group_by(.data$sub_period, .data$model) %>%
      tidyr::nest() %>%
      dplyr::filter(.data$sub_period != 'Other') %>%
      dplyr::mutate(mean = purrr::map(.x = .data$data,
                                      ~ base::mean(.x$shock, na.rm = T)),
                    test = purrr::map(.x = .data$data,
                                      ~ stats::t.test(.x$shock)$statistic),
                    sd = purrr::map(.x = .data$data,
                                    ~ stats::sd(.x$shock, na.rm = T)),
                    rmse = purrr::map(.x = .data$data,
                                      ~ base::sqrt(base::mean(.x$shock^2,
                                                              na.rm = T)))) %>%
      tidyr::unnest(cols = c(.data$mean, .data$test, .data$sd, .data$rmse)) %>%
      dplyr::mutate(ols = purrr::map(.x = .data$data,
                                     ~ lm(I_t ~ expected, data = .x)),
                    tidy_param = purrr::map(.data$ols, broom::tidy,
                                            conf.int = T,
                                            conf.level = conf_level),
                    tidy_ols = purrr::map(.data$ols, broom::glance)) %>%
      dplyr::select(-.data$data)

    if(y_plot %in% c('term')) {
      diagnostic <- dplyr::select(.data = diagnostic,
                                  .data$model, .data$sub_period,
                                  .data$tidy_param) %>%
        tidyr::unnest(cols = c(.data$tidy_param))
    }

    if(show_plot) {
      if(y_plot %in% c('mean', 'test', 'sd', 'rmse')) {
        plot_error(diag_obj = diagnostic, which_ = y_plot, type = 'period')
      }
      if(y_plot %in% c('term')) {
        plot_lm(diag_obj = diagnostic, which_ = y_plot, select_ = select_,
                type = 'period', wnd = wnd_sz, conf_level = conf_level)
      }
    }
  }


  # ---------------------------------------------------------------------------
  # Diagnostic by rolling window

  if( diag_type == 'rolling' ){

    # Conversion to tibbletime object and validation
    tt <- to_tibble_time(tbl = metrics)

    nm <- c('model')
    if( !all( nm %in% names(metrics) ) ){
      msg <- stringr::str_glue(
        'Metrics object has missing variables.\n',
        "'model' and 'sub_period' must be in the tibble.")
      stop(msg, call. = )
    }

    metrics_tt <- to_tibble_time(tbl = metrics)

    if(y_plot %in% c('mean', 'test', 'sd', 'rmse')) {
      # Define rolling functions
      roll_date <- tibbletime::rollify(~ dplyr::last( .x))
      roll_mean <- tibbletime::rollify(~ mean(.x, na.rm = T),
                                       window = wnd_sz)
      roll_test <- tibbletime::rollify( function(.x){
        if(!anyNA(.x)) {
          t_test <- stats::t.test(.x)
          t_test <- as.double(t_test$statistic)
          return(t_test)
        } else {
          return(NA)
        }
      }, window = wnd_sz)
      roll_sd <- tibbletime::rollify(~ sd(.x, na.rm = T),
                                     window = wnd_sz)
      roll_rmse <- tibbletime::rollify(~ base::sqrt(base::mean(.x^2,
                                                               na.rm = T)),
                                       window = wnd_sz)

      # Compilation of diagnostic measures: mean, t-test, sd, rmse
      diagnostic <- dplyr::group_by(.data = metrics_tt, .data$model) %>%
        tidyr::nest(data = -.data$model) %>%
        dplyr::mutate(date = purrr::map(.x = .data$data,
                                        ~ roll_date(.x$date)),
                      mean = purrr::map(.x = .data$data,
                                        ~ roll_mean(.x$shock)),
                      test = purrr::map(.x = .data$data,
                                        ~ roll_test(.x$shock)),
                      sd = purrr::map(.x = .data$data,
                                      ~ roll_sd(.x$shock)),
                      rmse = purrr::map(.x = .data$data,
                                        ~ roll_rmse(.x$shock))) %>%
        tidyr::unnest(cols = c(.data$date,
                               .data$mean, .data$test,
                               .data$sd, .data$rmse)) %>%
        dplyr::mutate_at(.vars = 'date', .funs = lubridate::as_date) %>%
        dplyr::select(-.data$data)

      if(show_plot) {
        plot_error(diag_obj = diagnostic, which_ = y_plot,
                   type = 'rolling', wnd = wnd_sz)
      }
    }

    if(y_plot %in% c('term')) {
      # Define rolling functions
      roll_date <- tibbletime::rollify(~ dplyr::last( .x))
      roll_lm <- tibbletime::rollify(.f = function(I_t, expected) {
        stats::lm(I_t ~ expected)
      }, window = wnd_sz, unlist = F)

      # Compilation of diagnostic measures: linear model (lm)
      diagnostic <- dplyr::group_by(.data = metrics_tt, .data$model) %>%
        tidyr::nest(data = -.data$model) %>%
        dplyr::mutate(date = purrr::map(.x = .data$data,
                                        ~ roll_date(.x$date)),
                      ols = purrr::map(.x = .data$data,
                                        ~ roll_lm(.x$I_t, .x$expected))) %>%
        dplyr::select(-.data$data) %>%
        tidyr::unnest(cols = c(.data$date, .data$ols)) %>%
        dplyr::filter(!is.na(.data$ols)) %>%
        dplyr::mutate(ols_tidy = purrr::map(.data$ols, broom::tidy,
                                            conf.int = T,
                                            conf.level = conf_level)) %>%
        dplyr::select(-.data$ols) %>%
        tidyr::unnest(cols = c(.data$ols_tidy)) %>%
        dplyr::mutate_at(.vars = 'date', .funs = lubridate::as_date)


      if(show_plot) {
        plot_lm(diag_obj = diagnostic, which_ = y_plot, select_ = select_,
                   type = 'rolling', wnd = wnd_sz, conf_level = conf_level)
      }
    }
  }

  return(diagnostic)
}
fognyc/bindr documentation built on Dec. 4, 2020, 12:33 p.m.