# Examples of including covariates with bayesdfa In bayesdfa: Bayesian Dynamic Factor Analysis (DFA) with 'Stan'

Here we will walk through how to use the bayesdfa package to fit dynamic factor analysis (DFA) models with covariates.

```library("knitr")
opts_chunk\$set(message = FALSE, fig.width = 5.5)
```

Let's load the necessary packages:

```library(bayesdfa)
library(ggplot2)
library(dplyr)
library(rstan)
chains = 1
iter = 10
```

## Notation review for DFA models

Covariates in dynamic factor analysis are generally included in the observation model, rather than the process model. Without covariates, the model can be expressed as

\$\${x}{t}={x}{t-1}+{e}t\ { e }{ t }\sim MVN(0,\textbf{Q})\ { y }{ t }=\textbf{Z}{ x }{ t }+{ v }{ t }\ { v }{ t }\sim MVN(0,\textbf{R})\$\$

where the matrix \$\textbf{Z}\$ is dimensioned as the number of time series by number of trends, and maps the observed data \$y_{t}\$ to the latent trends \$x_{t}\$.

### Observation covariates

Observation covariates can be \$\${x}{t}={x}{t-1}+{e}t\ { e }{ t }\sim MVN(0,\textbf{Q})\ { y }{ t }=\textbf{Z}{ x }{ t }+\textbf{D}{ d }{ t }+{ v }{ t }\ { v }_{ t }\sim MVN(0,\textbf{R})\$\$ where the matrix \$\textbf{D}\$ represents time series by number of covariates at time \$t\$. For a single covariate, such as temperature, this would mean estimating \$P\$ parameters, where \$P\$ is the number of time series. For a model including 2 covariates, the number of estimated coefficients would be \$2P\$ and so forth.

### Process covariates

Process covariates on the trends are less common but can be written as \$\${x}{t}={x}{t-1}+\textbf{C}{ c }{ t }+{e}_t\ { e }{ t }\sim MVN(0,\textbf{Q})\ { y }{ t }=\textbf{Z}{ x }{ t }+{ v }{ t }\ { v }{ t }\sim MVN(0,\textbf{R})\$\$ where the matrix \$\textbf{C}\$ represents the number of trends by number of covariates at time \$t\$. For a single trend, this would mean estimating \$K\$ parameters, where \$K\$ is the number of trends. For a model including 2 covariates, the number of estimated coefficients would be \$2K\$ and so forth.

## Examples -- observation covariates

We'll start by simulating some random trends using the `sim_dat` function,

```set.seed(1)
sim_dat <- sim_dfa(
num_trends = 1,
num_years = 20,
num_ts = 4
)
```

Next, we can add a covariate effect to the trend estimate, `x`. For example,

```cov = expand.grid("time"=1:20, "timeseries"=1:4, "covariate"=1)
cov\$value = rnorm(nrow(cov),0,0.1)

for(i in 1:nrow(cov)) {
sim_dat\$y[cov\$timeseries[i],cov\$time[i]] = sim_dat\$pred[cov\$timeseries[i],cov\$time[i]] +
c(0.1,0.2,0.3,0.4)[cov\$timeseries[i]]*cov\$value[i]
}
```

And now fit the model with `fit_dfa`

```mod = fit_dfa(y = sim_dat\$y, obs_covar = cov, num_trends = 1,
chains=chains, iter=iter)
```

We can then make plots of the true and estimated trend,

```plot(c(sim_dat\$x), xlab="Time", ylab="True trend")
```
```plot_trends(rotate_trends(mod)) + ylab("Estimated trend") + theme_bw()
```

This approach could be modified to have covariates not affecting some time series. For example, if we didn't want the covariate to affect the last time series, we could say

```cov = cov[which(cov\$timeseries!=4),]
```

And then again fit the model

```mod = fit_dfa(y = sim_dat\$y, obs_covar = cov, num_trends = trends,
chains=chains)
```

## Examples -- process covariates

As a cautionary note, there's some identifiability issues with including covariates in the process model. Covariates need to be standardized or centered prior to being included. Future versions of this vignette will include more clear examples and recommendations.

We'll start by simulating some random trends using the `sim_dat` function,

```set.seed(1)
sim_dat <- sim_dfa(
num_trends = 2,
num_years = 20,
num_ts = 3
)
```

Next, we can add a covariate effect to the trend estimate, `x`. For example,

```cov = rnorm(20, 0, 1)
b_pro = c(1,0.3)
x = matrix(0,2,20)

for(i in 1:2) {
x[i,1] = cov[1]*b_pro[i]
}

for(i in 2:length(cov)) {
x[1,i] = x[1,i-1] + cov[i]*b_pro[1] + rnorm(1,0,1)
x[2,i] = x[2,i-1] + cov[i]*b_pro[2] + rnorm(1,0,1)
}

y = sim_dat\$Z %*% x
```

And now fit the model with `fit_dfa`

```pro_cov = expand.grid("trend"=1:2, "time"=1:20, "covariate"=1)
pro_cov\$value = cov[pro_cov\$time]

mod = fit_dfa(y = sim_dat\$y, pro_covar = pro_cov, num_trends = 2,
chains=chains, iter=iter)
```

## Try the bayesdfa package in your browser

Any scripts or data that you put into this service are public.

bayesdfa documentation built on May 29, 2021, 1:06 a.m.