In srlanalytics/bdfm: Bayesian and Maximum Likelihood Estimation of Dynamic Factor Models

knitr::opts_chunk$set(echo = TRUE)

bdfm is an R package for estimating dynamic factor models. The emphasis of the package is on fully Bayesian estimation using MCMC methods via Durbin and Koopman's (2012) disturbance smoother. However, maximum likelihood estimation via Watson and Engle's (1983) EM algorithm and two step estimation following Doz, Giannone, and Reichlin (2011) is also supported. This document begins with a non-technical overview of dynamic factor models including simple examples. The second section introduces more rigorous notation and technical details regarding estimation techniques. For even more detail the book Practical Implementation of Factor Models is available for free download at [SRLquantitative.com]{srlquantitative.com}.

Non-Technical Overview

Time series data for real world applications can be messy. Potential difficulties include inaccurate (noisy) observations, missing data, "ragged edge" data due to varying start and end dates for different series, and a potentially large number of parameters to estimate. Dynamic factor models (DFMs) or state space models provide a means of addressing these big data problems.

We can describe dynamic factor modes in two parts. The first part is called the measurement or observation equation. Mathematically, we can write this equation as [ y_t = H x_t + \varepsilon_t ] Conceptually, this step relates our meaningful information or factors, represented by the vector $x_t$, to our large set of noisy data, represented by $y_t$. The second part of the model is called the transition equation, which we can write mathematically as [ x_t = A x_{t-1} + \epsilon_t ] This part of the model describes how our factors $x_t$, the meaningful information in which we are interested, evolve over time. The Kalman filter, the work horse of the linear-Gaussian model, operates by first predicting $x_t$ based on $x_{t-1}$ (the model here is written in terms of one lag but is easily extended to $p$ lags). Predictions for $x_t$ are then updated using data in $y_t$ as it becomes available. Data need not be realized at the same time; as soon as any additional information becomes available it can be incorporated into the model.

In economic and financial applications the greatest difficulty in implementing factor models is parameter estimation (in many physical models parameters may be informed by theory and thus need not be estimated). The package bdfm supports three methods for estimating these models. Bayesian estimation (method = "bayesian") is the default; this approach is able to incorporate prior beliefs we have about the data --- which series are more important for example --- and thus may yield superior predictions. Maximum likelihood estimation (method = "ml") finds the parameters maximize the log likelihood of what we observe yielding results that are nearly identical to Bayesian estimation when we do not specify any prior beliefs about the model. Finally, "two step" estimation (method = "pc") is useful when we have very large data sets (hundreds or thousands of series) without too many missing observations. When the number of observed series gets very large Bayesian and maximum likelihood estimation can be slow. Two step estimation, however, remains quick and computationally light by estimating parameters using principal components as factors.

Getting Started

As a starting example, we estimate a DFM using five monthly series from the St. Louis Fed's Fred database (see ?econ_us for data documentation and how to retrieve the latest data). Our main focus is on industrial production, INDPRO, which we want to forecast using a small collection of monthly indicator series. Note that all series should enter the model seasonally adjusted.

library(bdfm)

# INDPRO       Industrial Production Index, level
# AMTMNO       New Orderes, all manufacuring industries, level
# WHLSLRIRSA   Wholesalers, inventories:sales ratio
# MNFCTRIRSA   Manufacturers inventories:sales ratio
# ICSA         Initial claims
econ_small <- econ_us[, c("INDPRO","AMTMNO","WHLSLRIRSA","MNFCTRIRSA","ICSA")]

Input data, $y_t$ in the observation equation, is the only minimum requirement to estimate a DFM using the bdfm package:

m <- dfm(econ_small)
m

dfm uses heuristics and calls automatic procedures that works well in many circumstances. Here, they have opted for one factor and 3 lags, as can be inferred from the output. If you're using monthly data, 3 lags (one quarter) is a decent starting point. For daily data, 7 lags (one week, or 5 if data includes only business days) is probably a good guess.

dfm needs stationary data. This is usually enforced by taking logs and differences of series that are non-stationary. With the default "auto" settings, a modified Durbin-Watson test is performed to determine which series to differentiate. If these series don't have values below 0, logs are taken as well.

A model with one factor is a good way to create an index of the data, in this case a small index of monthly real activity in the U.S. The factor(s) of a model can be extracted by the factors function:

plot(factors(m))

The predict function returns the predicted values, including the forecasts. They are automatically converted to the original units. No special attention needs to be paid to the fact that some series have been differentiated. Note that the quarterly nature of some series is also preserved:

tail(predict(m))

To generate forecasts using bdfm the user can specify the number of periods ahead to forecast using the forecast argument. Keep in mind that the last observation may not be the same for every series when dealing with "ragged edge" data; forecast = 3 will generate forecasts 3 periods ahead of the latest observation of any series in our data set.

m_fct <- dfm(econ_small, forecast = 3)
tail(predict(m_fct))

An important question with any dynamic factor model is how much each observed series contributes to factors. The summary function gives a more comprehensive overview of the estimation:

summary(m)

Column 2 in the output shows the loadings of the single factor in the in observation equation. Industrial production (INDPRO) and new orders (AMTMNO) load negatively; inventory sales ratios (WHLSLRIRSA, MNFCTRIRSA) and jobless claims (ICSA) load positively.

Perhaps more importantly, the variances of shocks to the observation equation tell us how well the model fits each observed series. Column 1 shows the in-sample fit, which is 1 minus the variance of the shocks for each series. In our case, every series contributes to our updates, though initial jobless claims (ICSA) contributes the least.

Including Informative Priors

A great advantage of fully Bayesian DFM estimation is that it allows us to incorporate prior beliefs into parameter estimates. Suppose, for example, that we wanted a model in which initial jobless claims played a bigger role. We can accomplish this by incorporating the prior belief that shocks to initial jobless claims in the transition equation are small. We can specify this prior using the degrees of freedom for the inverse gamma prior distribution for the variance of shocks to the observed series initial jobless claims. It's default in 0, so setting higher value will increase its weight:

m_ic <- dfm(econ_small, obs_df = c("ICSA" = 1))
summary(m_ic)

Estimated shocks to the transition equation are now smaller implying jobless claims will play a bigger role in updating our factor.

Working with Bayesian DFMs

Our dynamic factor model can be described by the observation equation [ y_t = H x_t + \varepsilon_t ] the transition equation [ z_t = A z_{t-1} + \epsilon_t ] where $z_t = \begin{bmatrix} x_t & x_{t-1} & \ldots & x_{t-p+1} \end{bmatrix}$ and $A$ is the companion form of $B$ in the vector autoregression (VAR) [x_t = B \begin{bmatrix} x_{t-1} \ x_{t-2} \ \vdots \ x_{t-p} \end{bmatrix} + \epsilon_t ] and the distribution of shocks [ \begin{bmatrix} \epsilon_t\ \varepsilon_t \end{bmatrix} \sim \mathcal{N}\left( \begin{bmatrix} 0\ 0 \end{bmatrix}, \begin{bmatrix} Q & 0 \ 0 & R \end{bmatrix} \right) ] Our goal is therefore to estimate the parameters $B$, $H$, $Q$, and $R$. The default Bayesian estimation routine does this by sequentially drawing factors given parameters using Durban and Koopman's (2012) disturbance/simulation smoother and then drawing parameters given factors. We accept draws with probability 1, that is, bdfm performs these iterations via Gibbs sampling. The model is normalized by setting the top $m \times m$ submatrix of $H$ to the identity matrix where $m$ is the number of factors. This involves drawing the first $m$ rows of $H$ from the appropriate posterior distribution and then using this draw to rotate factors to meet our normalization criteria. By default, the model is normalized on the first $m$ principal components of the data; you can set manually specify identification using the identification argument.

Note that because Bayesian models are estimated by simulation, results will not always be identical. bdfm does not currently support setting the random number generator seed in the C++ functions.

Checking Convergence

Parameters for Bayesian DFMs estimated using the bdfm package are median values of the entire posterior distribution of draws. By default bdfm uses 500 burn in iterations and 1000 sampling iterations.

However, 500 burn in iterations may not be sufficient. The best way to evaluate whether our model has converged to a stationary distribution is to look at the trace for estimated parameters.

Parameter posterior distributions are stored as Bstore, Hstore, Qstore, and Rstore. The first three are cubes. Bstore, for example, has dimensions [m, m*p, reps]. For the first model we estimated, we can look at a few trace plots using the following:

# Look at traces for a few of the parameters in our first estimation
par(mfrow=c(2,2))
saved_par <- par("mar") # this is just for knitting the document
par(mar = c(3,2,2,1))   # this is just for knitting the document
ts.plot(m$Bstore[1,1,])
ts.plot(m$Hstore[1,1,])
ts.plot(m$Qstore[1,1,])
ts.plot(m$Rstore[1,])
par(mar = saved_par)    # this is just for knitting the document

While this is not a formal test, things here look pretty good, so there is probably no need to change the defaults.

Full Posterior Distributions of Predicted Values

By default, dfm does not store the full distribution of estimated values of observables in order to keep object size small. In a model with 20 observables (input series) over 500 periods using 1000 MCMC simulations this would result in 10 million stored values --- 500,000 for each data series. Instead, fitted values are derived from filtering and smoothing using posteior median parameter estimates.

However, if we are interested in predicting a certain variable, such as industrial production, it is usually more accurate to use the full distribution of predicted values for the series of interest. This is due to the fact that the posterior median for predicted values of a series is more robust to non-stationary parameter draws and under-identification.

For example, suppose our series for industrial production had a number of missing values we wished to estimate, including at the tail of the data thus constituting a forecast/nowcast (For our purposes "nowcast" just means predicting the tail of the data when there are other series available to update our predictions, "forecast" just means predicting the tail of the data where there are no other series available to update our predictions). We can store the posterior median and distribution of predicted scaled and centered industrial production by re-estimating the model as follows:

# drop observations from 'INDPRO' that we would like to predict
econ_small_na <- econ_small
econ_small_na[c(381:385, 401:405, 431:435, 465:469), "INDPRO"] <- NA

m_ind <- dfm(econ_small_na, keep_posterior = "INDPRO")

# MSE for estimate using median parameters
mean((m_ind$values[,1] - econ_small[,1])^2, na.rm = T)

# MSE for estimate using median of draws
mean((m_ind$Ymedian - econ_small[,1])^2, na.rm = T)

Note that we can store posterior distributions for only one series at a time. <!--

CHS: And which one should I use for forecasting

SETH: For any application (forecasting or nowcasting) using the posterior median for the simulated series --- not the posterior median parameter estimates --- will be better.

--> In most cases the mean squared error for predicted values using the median value of draws for industrial production in our MCMC simulations will be less than the value calculated by using the median values of parameters. We can look at the full distribution of predicted values for industrial production in a given period as follows:

hist(window(m_ind$Ystore, start = 2019, end = 2019), breaks = 30)

Forecast Updates

For every type of estimation, bdfm will store the prediction error and Kalman gain at every period. This allows us to look at exactly how much each series contributes to forecast updates at any point in time. Continuing with previous example of estimating industrial production when some values are missing, in period 467 each series contributed to factor updates as follows:

window(m_ind$idx_update, start = c(2017, 10))

A visualization of the data looks as follows:

library(ggplot2)

library(tsbox)
library(ggplot2)
ggplot(ts_na_omit(ts_df(window(m_ind$idx_update, start = c(2017, 10))))) +
  geom_col(aes(x = time, y = value, fill = id), position = "stack") +
  theme_tsbox() + scale_fill_tsbox()

What this result shows is the impact each series had on the estimated value of (log) industrial production in October 2018. Every series brought our estimate for (log) industrial production down in that period, though seasonally adjusted initial jobless claims (ICSA) had the largest effect. Technichally, what we are looking at here is the Kalman gain times (element-wise, as in the Hatamard product, not the matrix product) the prediction error for each series, multiplied by the loadings for industrial production. Note that forecast updates will only be reported for series that are observed in a given period.

We can also look at the impact each series had our factor(s) in any time period. Again, looking at October 2018:

window(m_ind$factor_update$factor_1, start = c(2018, 10))

Again we see that seasonally adjusted initial jobless claims had the largest effect on factors. Note that factor_update is a list to accomidate the case of more than one factor.

Other Estimation Methods

The package bdfm also supports maximum likelihood estimation, specified by setting method = 'ml' and two step principal component based estimation, specified by setting method = 'pc'. Each is described briefly below.

Maximum Likelihood

One can estimate a dynamic factor model via maximum likelihood (ML) by simply plugging the likelihood function into a numerical solver such as optim() in R. However, this is computationally inefficient approach and will be very slow for large models. For this reason the option method = 'ml' uses Watson and Engle's (1983) EM algorithm to estimate models by ML. This will generally be faster than Bayesian estimation by simulation and model identification is less important as rotating the factors will not change the log likelihood. However, maximum likelihood estimation does not allow the incorporation of prior beliefs, and does not support mixed frequency models. The five variable model from the previous section can be estimated by maximum likelihood as follows:

m_ml <- dfm(econ_small, method = "ml")
summary(m_ml)

Because factors are under-identified, the ML factor estimates will be different than Bayesian factor estimates. However, fitted values should be similar. Forecasting using a model estimated by ML uses the same syntax the Bayesian case.

m_ml_fct <- dfm(econ_small, forecast = 3, method = "ml")
tail(predict(m_ml_fct))

When using ML estimation any specification of priors for Bayesian estimation will be ignored.

Two Step Estimation

For very large models with few missing observation, the two step estimation following Doz, Giannone, and Reichlin (2009) is another possibility. We can estimate the simple five variable model introduced above using two step estimation and forecast three periods ahead of the last observation as follows:

m_pc_fct <- dfm(econ_small, forecast = 3, method = "pc")
tail(predict(m_pc_fct))

This approach estimates parameters of the model by treating principal components as factors, then re-estimates factors based on estimated parameters. Predictive power will typically be less than models estimated by Bayesian simulation or maximum likelihood, but the approach is very fast and can handle very large data sets. It is not suitable for data sets with many missing values, as there is no straightforward way of estimating principal components with missing observations.

srlanalytics/bdfm documentation built on Sept. 21, 2020, 10:45 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com