R/inflation.R

Defines functions inflation

Documented in inflation

#' Performs multiple type of operations on the Consumer Price Index (CPI)
#' series.
#'
#' The function \code{inflation} carry out three operations based on the CPI
#' series: \emph{series}, \emph{metrics} and \emph{validate}. The first
#' operation retrieves the CPI index and Treasury Bill series. It also returns
#' additional series derived from algebraic transformations (\emph{e.g.}
#' inflation rate, \emph{ex post} real rate) and lag operators (\emph{e.g.}
#' Treasury Bill at \emph{t-1}). The \emph{metrics} operation expands by
#' returning inflation metrics estimated from three different econometric models
#' based on \insertCite{Fama;textual}{bindr}. The last operation,
#' \emph{validate}, replicates the original results of
#' \insertCite{Fama;textual}{bindr} for validation purposes.
#'
#' The date parameters (\code{from_} , \code{to_}) must be of the following
#' form:
#' \itemize{
#'  \item '2020'
#'  \item '2020 Jan', '2020/Jan', '2020-Jan'
#'  \item '2020 jan', '2020/jan', '2020-jan'
#'  \item '2020 01', '2020/01', '2020-01'
#' }
#'
#' \strong{The first operation (\emph{series})} returns the \strong{observable}
#' CPI index and Treasury Bill, along with additional series derived from
#' applying algebraic transformations and lag operators. Specifically, the
#' monthly \strong{inflation rate} is \eqn{I(t) = log( CPI(t) ) - log( CPI(t-1)
#' )}, where \eqn{log} stands for the natural log. The \strong{\emph{ex post}
#' real rate} is expressed as \eqn{TB(t-1) - I(t)}, where \eqn{TB(t-1)} is the
#' one-month interest rate observed at the end of month \eqn{(t-1)}. See
#' additional details below.
#'
#' Note that the CPI index and Treasury Bill quotes reside in two different
#' sources. The CPI series is sourced from the the \strong{F}ederal
#' \strong{R}eserve \strong{E}conomic \strong{D}ata (\strong{FRED}), while the
#' 1-month T-Bill return is \emph{hosted} in the standard 3-factor French-Fama
#' model (Kenneth R. French Data Library). It is ultimately \emph{sourced} from
#' Ibbotson and Associates (now a Morningstar Company) and goes back to 1926.
#' These two series are not necessarily updated in a synchronous manner, leading
#' to potential sample mismatch. The parameter \strong{error_on_join_NA} governs
#' how such sample irregularities are resolved. The call
#' \code{inflation(operation = 'series')} generates an error when
#' \code{error_on_join_NA = TRUE} \emph{\strong{and}} the series are not
#' synchronized. When \code{error_on_join_NA = FALSE}, irregular samples
#' generate a warning and a display of records with NA's (\emph{i.e.} not
#' updated), which will be dropped from the final sample . Also note that the
#' operation \emph{metrics} is also directly impacted by the
#' \strong{error_on_join_NA} setting as it inherits its base sample from the
#' operation \emph{series}.
#'
#' \strong{The second operation (\emph{metrics})} expands on operation
#' \emph{series} by returning two inflation components, expected and unexpected
#' (\emph{e.g.} shock), derived from from three different econometric models.
#' Each model is defined by a specific dynamics governing inflation or real
#' rates and is documented in  \insertCite{Fama;textual}{bindr} as 'Interest
#' rate model' (\emph{cf}. section 2.2), 'Naive interest rate model' (\emph{cf}.
#' section 2.3) and 'Time series model' (\emph{cf}. section 2.1).
#'
#' The 'Interest rate model' (\emph{cf}. section 2.2) is underpinned by a
#' formulation proposed by \insertCite{Fisher;textual}{bindr}, whereby the
#' observed nominal interest rate \eqn{TB(t-1)} can be broken into an expected
#' real return for month \eqn{t}, \eqn{E[R(t-1)]}, and an expected inflation
#' rate \eqn{E[I(t-1)]}. The compact expression of this formulation is
#' \eqn{TB(t-1) = E[R(t-1)] + E[I(t-1)]}. (\emph{cf}.
#' \insertCite{Fama;textual}{bindr}, section 2.2, equation (7)). The expected
#' inflation (\eqn{E[I(t-1)]}) is recovered by combining the observed nominal
#' rate (Treasury Bill rate \eqn{TB(t-1)}) and the expected real rate
#' (\eqn{E[R(t-1)]}), which is derived from an explicit model describing the
#' real rate dynamics. In this case, \strong{\emph{ex post} real rates} are
#' assumed to follow a \strong{random walk} model (\emph{i.e.} unit root),
#' \emph{as supported by the available empirical evidence at the time}
#' (\emph{e.g.} \insertCite{Garbade;textual}{bindr}). The main features of a
#' random walk are persistence (possibly never decaying) of shocks over very
#' long horizons and non-stationarity of time-series (\emph{e.g.} unstable
#' volatility). The next step exploits a standard property of random walks
#' (\emph{i.e.} standard Brownian motions, see
#' \insertCite{Hamilton;textual}{bindr}, Chap. 17), namely that their difference
#' (say between time \emph{s} and time \emph{t}) follows an independent Gaussian
#' distribution with variance \emph{s - t} and \eqn{s > t }. It follows that the
#' first order difference of the \emph{ex post} real rates is described by
#' Gaussian statistical white noises and can be modeled as a standard
#' time-series moving-average (MA) process (\emph{cf}.
#' \insertCite{Fama;textual}{bindr}, section 2.2, equation (10) and
#' \insertCite{Hamilton;textual}{bindr}, Chap. 3). To summarize the procedure,
#' first estimate a MA model on the first order difference of \emph{ex post}
#' real rates, which yields the expected real return \eqn{E[R(t-1)]}. The
#' expected inflation \eqn{E[I(t-1)]} is then recovered by subtracting the
#' expected real return from the observed nominal interest rate \eqn{TB(t-1)}.
#'
#' It is worth mentioning that imparting a unit root to a time-series, either
#' from an assumption or from statistical inference, has meaningful
#' consequences. Indeed, '\emph{(...) the conclusion that an economic time
#' series contains a unit root (...) has important consequences for dynamic
#' economic models. For example, with a unit root there is no deterministic
#' long-run growth path to which the economic variable tends to revert.
#' Moreover, uncertainty about the level of an economic series grows larger
#' indefinitely as one forecasts further into the future. Thus, for an
#' integrated series (containing a unit root), it is not meaningful to discuss
#' the \emph{long-run} mean or variance of the process. In terms of business
#' cycle modeling, a unit root means that part of the innovation to the series
#' causes a permanent change in the level of the series}'.
#' (\insertCite{Schwert:1987;textual}{bindr})
#'
#' Also note that the 'Interest rate model' uses an \emph{ex post} measure of
#' real rates as \emph{ex ante} real rates are not directly observable for
#' modeling purposes. This might seem problematic at first since the \emph{ex
#' ante} real rate is the relevant measure for evaluating economic decisions in
#' asset pricing theories. An approach to solve this difficulty is to use the
#' actual inflation rate as a proxy for inflation expectations. Under rational
#' expectations, the \emph{ex post} and \emph{ex ante} real rates differ only by
#' a white noise component, so the the \emph{ex post} and \emph{ex ante}
#' measures share the same long-run properties. This result even holds under
#' less stringent assumptions, \emph{e.g.} if the expectation errors are
#' stationary. This implies in practical terms that investors forecast
#' inflation with some imperfections, but the magnitudes of these errors is not
#' growing unbounded and remains stable. See \insertCite{Neely;textual}{bindr}
#' for a detailed discussion.
#'
#' The 'Naive interest rate model' (\emph{cf}. section 2.3) stands as a coarse
#' approximation to the 'Interest rate model'. Specifically, stationary MA
#' (moving-average) models expressed as linear combination of white noises can
#' be transformed in AR (auto-regressive) models
#' (\insertCite{Hamilton;textual}{bindr}, Chap. 3). For instance the MA(1) model
#' \eqn{y(t) = \epsilon(t) + \theta \epsilon(t-1)} has the equivalent AR(p)
#' representation \eqn{y(t) = (1 - \theta) y(t-1) + \theta (1 - \theta) y(t-2) +
#' \theta^2 (1 - \theta) y(t-3) + (...)}. The estimate obtained by
#' \insertCite{Fama;textual}{bindr} is  \eqn{\theta = 0.92} and the AR model
#' with 12 terms translates into the following coefficients: (0.078, 0.072,
#' 0.066, 0.061, 0.056, 0.052, 0.048, 0.044, 0.041, 0.038, 0.035, 0.032). This
#' 'Naive interest rate model' takes a short cut by using an equally weighted
#' average of the past 12 rates (\emph{ex post}, monthly real rate), in effect
#' overriding and coarsely approximating the coefficient sequence above with a
#' sequence of equal weights (\eqn{1/12 = 0.083}).
#'
#' The 'Time series model' (\emph{cf}. section 2.1) is driven by a statistical
#' description of the inflation and its long-run properties, summarized by a
#' random walk (\emph{i.e.} presence of unit root). The empirical results of
#' \insertCite{Mishkin;textual}{bindr} broadly support this assumption, although
#' debate and research are still ongoing and no definitive consensus has emerged
#' yet. This model differs from the 'Interest rate model' in two respects: (a)
#' inflation (as opposed to the \emph{ex post} real rate) follows a random walk
#' (b) the Fisherian view of interest rate is discarded and inflation
#' expectations are recovered \emph{via} a statistical model capturing the
#' long-run properties of the inflation rate. In essence, the 'Time series
#' model' is free from the structural constraint imposed by the Fisherian
#' perspective. In this case the procedure is straightforward. First estimate a
#' MA model on the first order difference of the inflation rate, which yields
#' directly the expected inflation rate \eqn{E[I(t-1)]} (see
#' \insertCite{Hamilton;textual}{bindr}, Chap. 3).
#'
#' \strong{The third operation (\emph{validate})} reproduces the original
#' results published by \insertCite{Fama;textual}{bindr}, and in particular
#' fragments of Table 2 (p. 334) and Table 4 (p. 338). Table 2 is an in-sample
#' regression of \emph{actual} monthly inflation rates on \emph{estimated
#' expected} inflation rates extracted from the econometric models governing the
#' inflation and real rate dynamics. The 'tibble' objects \code{diagostic_param}
#' and \code{diagostic_stats} summarize the results (see further details in the
#' Value section). The ideal model should have a constant close to 0 and a slope
#' estimate near 1.0, as a well-calibrated model should yield anticipations
#' close to actual realizations. Note that statistical inference from standard
#' regression can be highly problematic under certain conditions, such as in a
#' time-series context with non-stationary series. More details further below.
#'
#' Additional diagnostic and validation results can be found in Table 4
#' (\insertCite{Fama;textual}{bindr}, p. 338), which summarizes the in-sample
#' forecasts errors of the three inflation models along seven non-overlapping
#' sub-periods. The seven sub-periods are defined as follows:
#' \itemize{
#' \item Sub-period 1: 1954/1 - 1957/6
#' \item Sub-period 2: 1957/7 - 1960/12 \item Sub-period 3: 1961/1 - 1964/6
#' \item Sub-period 4: 1964/7 - 1967/12 \item Sub-period 5: 1968/1 - 1971/6
#' \item Sub-period 6: 1971/7 - 1974/6 \item Sub-period 7: 1974/7 - 1977/12
#' }
#' The results are contained in the \code{diagostic} 'tibble' object and include
#' the average monthly forecast error (mean), the \emph{t}-statistic for the
#' test of the null hypothesis that the average forecast error is equal to zero
#' (test), the standard deviation of the monthly forecast error (sd) and the
#' square root of the average squared forecast error (rsme). In addition, the
#' function call produces a plot summarizing the \emph{replicated} standard
#' deviation of the monthly forecast error \emph{vs.} the \emph{original}
#' corresponding figures found in \insertCite{Fama;textual}{bindr}.
#'
#' The replicated results are fairly close to the original set and follow an
#' identical pattern across the seven sub-periods, yet some minor differences
#' can be observed. The most plausible explanation centers on the difference
#' between the \strong{F}ederal \strong{R}eserve \strong{E}conomic \strong{D}ata
#' (\strong{FRED}) and the \strong{A}rchiva\strong{L} \strong{F}ederal
#' \strong{R}eserve \strong{E}conomic \strong{D}ata (\strong{ALFRED})
#' time-series for the CPI index. The ALFRED CPI series contains the original
#' index values registered at the time of entry, hence the qualifier
#' \emph{Archival}. In contrast, the FRED CPI series overwrites the existing
#' entries with the latest data iteration:
#'
#' "\emph{ALFRED allows you to retrieve vintage versions of economic data that
#' were available on specific dates in history. In general, economic data for
#' past observation periods are revised as more accurate estimates become
#' available. As a result, previous vintages of data can be superseded and may
#' no longer be available from various data sources. \strong{Vintage or real
#' time economic data allows academics to reproduce others' research}} (emphasis
#' added), \emph{build more accurate forecasting models, and analyze economic
#' policy decisions using the data available at the time.}" (Source: ALPFRED,
#' St. Louis Fed's Economic Research Division)
#'
#' Revisions can have a meaningful impact when a significant change in
#' methodology is introduced. Changes associated with housing costs (1950, 1983,
#' 1987, 1997 and  2005) are prime examples. See
#' \insertCite{Carson;textual}{bindr} for a detailed account of changes in CPI
#' housing costs and \insertCite{Greenlees;textual}{bindr} for other
#' controversies.
#'
#' Although the ALFRED CPI series is available and could be used to replicate
#' more closely the original results of \insertCite{Fama;textual}{bindr}, this
#' package does not attempt to do so, for at least three reasons: (\strong{a})
#' the replicated standard deviations of the monthly forecast errors match the
#' original pattern across sub-periods - elevated in the first two sub-periods,
#' at the lowest in sub-periods 3 to 5, and at the highest in sub-periods 6 and
#' 7. (\strong{b}) the replicated standard deviations are generally lower or
#' very close to the original corresponding figures  (\strong{c}) the absolute
#' difference is \strong{at most} 3 basis points (\emph{i.e.} 0.0003 x 10,000)
#' per month or 36 basis points annually.
#'
#' The parameter \emph{src_dir} must be a valid and existing directory. An error
#' is generated if either one of these conditions is not satisfied. Source file
#' \emph{src_file} must have a .csv format suffix, otherwise an error is
#' generated. Note that the underlying filing structure must have a
#' \strong{<root>} directory and at least two additional existing
#' sub-directories: \strong{<root>/Audit} and \strong{<root>/Uncompressed}. The
#' function will stop and generate an error message if any of these directories
#' does not exist. Most likely, these sub-directories will be created (if they
#' do not exist) by the function \emph{fetch} from the package \pkg{factorr}.
#'
#' The parameter \code{as_factor = TRUE} will cause inflation metrics from all
#' three models ('Interest Rate', 'Naive', 'Time-Series') to be separately saved
#' on file as factors, but only if the selected call is
#' \code{inflation(operation = 'metrics')}. File time stamps, attributes and
#' path directories are documented in .pdf files located in the Audit
#' sub-directory, along with additional details maintained for auditing and
#' monitoring purposes. Finally note that \strong{all} file permissions are set
#' to read-only to prevent unintended modifications.
#'
#'
#' @param operation A string representing the requested operation. See
#'   documentation for additional details.
#' @param from_,to_ A string representing the sample start and end date,
#'   respectively. See documentation for admissible formats.
#' @param error_on_join_NA Logical parameter governing how irregular samples are
#'   resolved. See details.
#' @param MA_q A integer, strictly greater than 0. Controls the order of the
#'   moving-average (MA) process to model inflation and real rates in a
#'   time-series framework. See documentation for additional details.
#' @param conf_level A double, between 0.5 and 1.0 (excluding boundary points).
#'   Controls the parameter confidence intervals in the inflation econometric
#'   models.
#' @param src_dir,src_file A string representing the source directory and file
#'   name containing the the primary series (\emph{e.g.} CPI, T-Bills). See
#'   details.
#' @param as_factor Logical parameter (default is FALSE) to control if inflation
#'   metrics are saved on file as factors. See details.
#'
#' @return Return objects are contingent on the requested operation, but
#'   generally follow a 'nesting' relation ranging from the simplest
#'   (\code{operation = 'series'}) to the more complex (\code{operation =
#'   'metrics'}). When applicable, multiple 'tibble' objects are returned in a
#'   list. Specifically:
#'
#'   \itemize{ \item The \code{operation = 'series'} call returns a 'tibble'
#'   object (class 'tbl_df, 'tbl', 'data.frame') containing: year, month, date,
#'   year_month, CPI, Inflation (I_t), T-Bill (TB_t), lagged T-Bill (TB_t_1) and
#'   \emph{ex post} real rate \item The \code{operation = 'metrics'} call
#'   returns \itemize{ \item \code{metrics}: same 'tibble' object returned by
#'   \code{operation = 'series'}, but with three additional variables: the
#'   'expected' inflation component (\emph{i.e.} fitted model response), the
#'   'shock' or the unanticipated inflation component (model residual) and the
#'   'model' class (Treasury Bills, Naive or Time-Series) \item
#'   \code{arima.ITB}: 'tibble' object associated with the Treasury Bill
#'   (interest rate) model in \insertCite{Fama;textual}{bindr}, section 2.2.
#'   Contains MA(q) parameter estimates, standard error and confidence interval.
#'   \item \code{arima.ITS}: similar to \code{arima.ITB}, but associated with
#'   the time-series model in \insertCite{Fama;textual}{bindr}, section 2.1.
#'   \item \code{diagostic_param}: 'tibble' object grouped by model. The table
#'   follows closely \insertCite{Fama;textual}{bindr}, Table 2, p. 334 and is
#'   designed as a diagnostic tool. The \strong{actual} inflation rate is
#'   regressed against the \strong{anticipated} inflation component extracted
#'   from the econometric models governing the inflation and real rate dynamics.
#'   The ideal model should have a constant close to 0 and a slope estimate near
#'   1.0. Equivalently, a well-calibrated model should yield anticipations close
#'   to actual realizations. \item \code{diagostic_stats}: 'tibble' object
#'   grouped by model and directly connected to \code{diagostic_param}. Contains
#'   adjusted-R squared, model standard error, degrees of freedom and number of
#'   observations.} \item The \code{operation = 'validate'} call returns
#'   \itemize{ \item every object returned by the \code{operation = 'metrics'}
#'   call \item \code{diagostic}: 'tibble' object \strong{grouped by model and
#'   sub-period} (see sub-period definitions above). The table follows closely
#'   \insertCite{Fama;textual}{bindr}, Table 4, p. 334 and is designed as a
#'   diagnostic tool to examine the monthly forecast error by non-overlapping
#'   sub-period. The 'tibble' object contains a range of forecast error
#'   statistics (mean, t-test, standard error, root mean square error).}}
#'
#' @references{\insertAllCited{}}
#'
# If dplyr::select is in a package, using .data also prevents R CMD check from
# giving a NOTE about undefined global variables (provided that @importFrom
# rlang .data is inserted). Using rlang::.data (without @importFrom rlang .data)
# is another option. See Wickham, Hadley, R Packages, O'Reilly, 1st Edition,
# 2015, p. 89 for details

#' @importFrom rlang .data
#' @importFrom magrittr "%>%"
#' @export
inflation <- function(operation = c('series', 'metrics', 'validate'),
                      from_ = NA, to_ = NA, error_on_join_NA = TRUE,
                      MA_q = 1, conf_level = 0.95,
                      src_dir =
                        '~/Desktop/UMich/Factor Warehouse/Uncompressed/',
                      src_file = c(CPI = 'FRED_CPI_US_M.csv',
                                   TB = 'FF_3F_US_M.csv'),
                      as_factor = FALSE){

  arg_list <- as.list(environment())
  operation <- match.arg(operation)
  stopifnot(MA_q >= 1)
  stopifnot(conf_level > 0.5 & conf_level < 1.0)

  if(all(stringr::str_detect(string = src_file, pattern = '.csv')) == FALSE){
    stop('All source file (src_file) parameter must end with .csv',call. = T)
  }

  # ----------------------------------------------------------------------------
  # SOURCE directory path: tidy construction and validation
  src_dir <- fs::path_tidy(src_dir)
  if( fs::file_exists(src_dir) == F) {
    stop(stringr::str_glue('Source directory ', src_dir,
                           ' does not exist'), call. = T)
  }

  if(as_factor == TRUE){
    # ROOT DESTINATION directory path: tidy construction and validation
    dest_root_dir <-stringr::str_split(string = src_dir,
                                       pattern = '/Audit|/Uncompressed')[[1]][1]
    dest_root_dir <- fs::path_tidy(dest_root_dir)
    if( fs::file_exists(dest_root_dir) == F) {
      stop(stringr::str_glue('Root destination directory ', dest_root_dir,
                             ' does not exist'), call. = T)
    }

    # DESTINATION SUB-directory path: validation
    temp_dir <- c(stringr::str_glue(dest_root_dir, '/Audit'),
                  stringr::str_glue(dest_root_dir, '/Uncompressed'))
    base::invisible(
      purrr::map(.x = temp_dir, .f = function(.){
        if( fs::file_exists(.) == F) {
          stop(stringr::str_glue('Destination sub-directory ', .,
                                 ' does not exist'), call. = T)
        }
      })
    )
    remove(temp_dir)
  }
  # ----------------------------------------------------------------------------

  if(operation == 'series') {
    tsbl <- inflation_series(from_ = from_, to_ = to_,
                             stop_on_join_NA = error_on_join_NA,
                             src_dir = src_dir, src_file = src_file)
  }

  if(operation == 'metrics') {
    tsbl <- inflation_series(from_ = from_, to_ = to_,
                             stop_on_join_NA = error_on_join_NA,
                             src_dir = src_dir, src_file = src_file)
    tsbl <- inflation_metrics(obs = tsbl, q = MA_q, conf_level = conf_level)

    if(as_factor == TRUE){
      model_nm <- unique(tsbl$metrics$model)
      base::invisible(purrr::map(.x = model_nm, .f = function(.){
        hdl_kernel <- base::switch(EXPR = as.character(.),
                                   'Time-series' = 'timeSeries',
                                   'Treasury Bills' = 'treasuryBills',
                                   'Naive' = 'naive')
        hdl <- stringr::str_glue('INFLATION__', hdl_kernel, '__US_M')
        write_inflation_factor(tsbl = dplyr::filter(.data = tsbl$metrics,
                                                    .data$model == .),
                               model_type = base::as.character(.),
                               hdl_str = hdl, dest_root_dir = dest_root_dir,
                               func_arg =  arg_list)
      }))
    }
  }

  if(operation == 'validate') {
    tsbl <- inflation_series(from_ = '1953-03', to_ = '1977-12',
                             stop_on_join_NA = error_on_join_NA,
                             src_dir = src_dir, src_file = src_file)
    tsbl <- inflation_metrics(obs = tsbl, q = 1)
    tsbl$diagnostic <- inflation_validate_FamaGibbons(metrics = tsbl$metrics,
                                                      show_plot = T)
    tsbl$diagnostic <- dplyr::arrange(.data = tsbl$diagnostic,
                                      .data$sub_period)

  }

  return(tsbl)
}
fognyc/bindr documentation built on Dec. 4, 2020, 12:33 p.m.