dfm: Estimate a Dynamic Factor Model

Description Usage Arguments Details See Also Examples

View source: R/dfm.R

Description

Estimates a Bayesian or non-Bayesian dynamic factor Model. With the default options, dfm calls automatic procedures that works well in many circumstances.

Usage

1
2
3
4
5
6
7
8
9
dfm(data, factors = 1, lags = "auto", forecasts = 0,
  method = c("bayesian", "ml", "pc"), scale = TRUE, logs = "auto",
  diffs = "auto", outlier_threshold = 4, frequency_mix = "auto",
  pre_differenced = NULL, trans_prior = NULL, trans_shrink = 0,
  trans_df = 0, obs_prior = NULL, obs_shrink = 0, obs_df = NULL,
  identification = "pc_long", keep_posterior = NULL,
  interpolate = FALSE, orthogonal_shocks = FALSE, reps = 1000,
  burn = 500, verbose = interactive() &&
  !isTRUE(getOption("knitr.in.progress")), tol = 0.01)

Arguments

data

one or multiple time series. The data to be used for estimation. This can be entered as a "ts" object or as a matrix. If tsbox is installed, any ts-boxable time series can be supplied (ts, xts, zoo, data.frame, data.table, tbl, tbl_ts, tbl_time, or timeSeries)

factors

integer. The number of unobserved factors to be estimated. A larger number of factors leads to a more complex model. Denoted as 'm' in the documentation.

lags

integer. The number of lags in the transition equation. If "auto" (default), the number is equal to highest frequency in data. Denoted as 'p' in the documentation.

forecasts

integer. Number of periods ahead to forecasts.

method

character. Method to be used; one of "bayesian", "ml" or "pc". See details.

scale

logical. Should data be scaled before estimation? TRUE (default) resolves some numerical problems during estimation. FALSE ensures that the coefficient estimates are interpretable.

logs

names or index values (see details). Series of which the logarithm is taken (can be combined with diffs). If "auto" (default) this is done for all series that are differentiated and have no values < 0.

diffs

names or index values (see details). Series to be differentiated. If "auto" (default), a modified Durbin-Watson test is performed.

outlier_threshold

integer. Observations more than outlier_threshold standard deviations from the series mean are removed. This is useful to increase the stability of the estimation.

frequency_mix

integer or "auto". Number of high frequency periods in a low frequency period. If "auto" (default), this is inferred from the time series.

pre_differenced

names or index values (see details). series entered in differences (If series are specified in diffs, this is not needed.)

trans_prior

m x mp (m: factors, p: lags) prior matrix for B (the transition matrix) in the transition equation. Default is zeros. E.g., to use a random walk prior with m factors and p lags, set trans_prior = cbind(diag(1,m,m), matrix(0,m,m*(p-1))).

trans_shrink

numeric. Prior tightness on B matrix in transition equation where a value of zero is used to denote an improper (flat) prior (i.e. no shrinkage). Use to shrink forecast values towards the prior trans_prior, which may help reduce parameter uncertainty in estimation.

trans_df

numeric. Prior degrees of freedom for inverse-Wishart distribution of shocks in the transition equation, where 0 implies no shrinkage. Shrinking shocks to the transition equation will increase the magnitude of shocks to the observation equation dampening updates from observed series. High values of trans_df can lead to instability in simulations.

obs_prior

k x m (k: observed series, m: factors) prior matrix for H (loadings) in the observation equation Default is zeros.

obs_shrink

numeric. Prior tightness on H (loadings) in the observation equation where a value of zero is used to denote an improper (flat) prior (i.e. no shrinkage). A greater value will shrink estimates of loadings more aggressively towards the prior obs_prior. When the prior is zero (the default value), this is an alternative (and typically more stable) approach to dampening the impact of updates from observed series.

obs_df

named vector (see details). prior degrees of freedom for inverse chi-squared distribution in the observation equation. This is useful to give specific series a larger weight, e.g. 1. (default 0).

identification

names or index values (see details), or character. Factor identification. "pc_long" (default) identifies on principal components from series with at least the median number of observations. "pc_wide" identifies on principal components using all series, where rows of the observations matrix containing missing data are omitted. "name" uses Stock and Watson's "naming factors" identification, i.e. identifying on the first m series provided where m is the number of factors. Identification can also be done manually, by supplying names or index values from which identifying series are derived via principal components.

keep_posterior

names or index values (see details). Series of which to keep the full posterior distribution of predicted values (method "bayesian" only). This is useful for forecasting as the posterior median forecast value tends to me more accurate than forecasts using the posterior median parameter estimates, and allows for the evaluation of forecast accuracy.

interpolate

logical. Should output return intra-frequency estimates of low frequency observables? Put differently, if the model includes monthly and quarterly data, should output include estimates of quarterly data every month (where quarterly refers to an aggregate of the current and previous 2 months; for interpolate = TRUE) or just at the end of the quarter (months 3, 6, 9, and 12; for interpolate = FALSE, default)?

orthogonal_shocks

logical. Return a rotation of the model with orthogonal shocks and factors. This is used to isolate the impact of each factor on observables, allowing for a clean interpretation of how shocks (which, if TRUE are not correlated) impact observed series.

reps

integer. Number of repetitions for MCMC sampling

burn

integer. Number of iterations to burn in MCMC sampling

verbose

logical. Print status of function during evaluation. Default is TRUE in interactive mode, FALSE otherwise, so it does not appear, e.g., in reprex::reprex().

tol

numeric. Tolerance for convergence of EM algorithm (method "ml" only). The default value is 0.01 which corresponds to the convergence criteria used in Doz, Giannone, and Reichlin (2012).

Details

Specifying series: Individual series can be specified either by names (recommended) or index values. An index value refers to the position of the series in data.

Specifying parameters for specific series: Parameters for individual series can be specified using a named vector (recommended) or using a unnamed vector of the same length as as the number of series in data.

See Also

vignette("dfm"), for a more comprehensive intro to the package.

Practical Implementation of Factor Models for a comprehensive overview of dynamic factor models.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
dta <- cbind(fdeaths, mdeaths)

m0 <- dfm(dta, forecast = 2) # estimation with 2 period forecast
predict(m0)                  # series with imputations and forecasts
summary(m0)                  # summary of the model
factors(m0)                  # estimated factor

# informative priors: giving 'fdeaths' a higher weight
m1 <- dfm(dta, obs_df = c("fdeaths" = 1))
summary(m1)

## Not run: 
# Forecasting U.S. GDP
m1 <- dfm(econ_us,
  pre_differenced = "A191RL1Q225SBEA",
  keep_posterior = "A191RL1Q225SBEA"
)

# interpolating low frequency series
dta_mixed <- econ_us[, c(1, 3)]
predict(dfm(dta_mixed))
predict(dfm(dta_mixed, interpolate = TRUE))

## End(Not run)

srlanalytics/bdfm documentation built on Sept. 21, 2020, 10:45 p.m.