In tweed1e/idiosyncratics:

knitr::opts_chunk$set(fig.width = 7, fig.height = 5, message = FALSE, warning = FALSE)

How much do idiosyncratic shocks contribute to aggregate growth? I'll use principal components analysis (PCA) to estimate the common factors and recover idiosyncratic shocks. I'll test the Gabaix and the DiGiovanni-Levchenko-Mejean (DLM) methods. We'll use some fake firm panel data to document the process.

Steps:

Create fake firm data with some time series properties
Process the data:
- add (lagged) firm weights,
- add growth rates,
- add upper/lower growth rate quantiles,
- deal with outliers based on those quantiles (drop or winsorize)
- demean the data (find idiosyncratic shocks $e_{it}$)
Use the ICP criteria from Bai and Ng (2002) and sparse PCA (via the glmnet package) to estimate the number of common factors to use to recover the idiosyncratic shocks
Analyze the results given by Gabaix and DLM methods

Step 0: Load the libraries.

``` {r echo = TRUE} library(idiosyncratics) library(tidyverse)

via devtools::install_github('tweed1e/fakedata')

library(fakedata)

## Step 1: Create the fake data.
``` {r echo = TRUE}
# Suppose the data has N firms and T years
N <- 500
T <- 25

# A function to create a fake panel.
asm <- fakedata::fake_panel(N = N, T = T, I = 5, 
                            model = list(order = c(1,0,1), ar = 0.75, ma = 0.05), 
                            missing = 0.05) # with 5% of the data randomly missing

asm <- asm %>% rename(industry = I)

Step 2: Process the data.

``` {r echo = TRUE} asm <- asm %>% add_weights(x_var = sales) %>% # Add weights based on firm sales lag_panel(x_var = w, l_var = w) %>% # Lag the weight variable so we can weight growth rates correctly add_growth_rates(x_var = sales, drop = TRUE) %>% # Calculate growth rates, dropping observations with NA growth rates add_quantiles(g_var = g) %>% # Add upper/lower quantiles of growth rates (0.025, 0.975), by year drop_outliers(g_var = g, method = 'winsorize') # winsorize growth rates.

The PCA methods require a balanced panel.
``` {r echo = TRUE}
# Balance the panel
asm <- asm %>% balance_panel(id, year)

# get a separate copy of the aggregate growth rates
g <- asm %>% group_by(year) %>% 
  summarize(g = sum(w * g, na.rm = TRUE)) %>% 
  tibble::rownames_to_column(var = 'index')

# Convert balanced growth rate panel to a matrix, along with the weights
X <- get_matrix(asm, id, year, g)
W <- get_matrix(asm, id, year, w)

Step 3: Estimate common factors

First, use the ICP criteria to estimate the number of common factors:

``` {r echo = TRUE}

ICP criteria to estimate no. of factors from Bai and Ng (2002),

and return the residuals after estimating the common factors

xz <- icpx_pca(X, W)

Next, use sparse PCA (via elastic-net in ```glmnet```). This selects a small subsample of individual firms to best predict aggregate growth. 

``` {r echo = TRUE}
# Sparse PCA via the Lasso in glmnet.
xy <- list('sparse' = sparse_pca(X, W))

Then combine all the residuals together in one table for analysis ``` {r echo = TRUE}

combine all

components <- bind_rows(xz, xy, .id = 'type') %>% left_join(g, by = c('year' = 'index')) %>% select(-year) %>% rename(year = year.y, e = value) %>% mutate(type = ifelse(type == 'sparse', "Sparse", type))

First, graph the residuals against the aggregate growth rate to see which ones track the best---a residual that tracks the aggregate well means the common factors don't match the aggregate.
``` {r echo = TRUE}
# graph.
tidy_components <- bind_rows(components %>% select(-g),
                             g %>% mutate(index = 'Aggregate') %>% rename(e = g, type = index))

ggplot(tidy_components,
       aes(x = year, y = e, colour = type)) +
  geom_line() +
  labs(x = "Year", 
       y = "Growth", 
       title = 'Idiosyncratic components after removing common factors') +
  scale_colour_discrete(name="No. fctr")

Step 4: Compare Gabaix and DLM for each method and no. of factors.

Finally, we need to calculate the results. Gabaix regresses the aggregate growth rate $g_t$ on the idiosyncratic shock $e_t$, and calls the $R^2$ the amount that $e_t$ explains $g_t$. In R, lm will do OLS and report the $R^2$ in the as an element of the summary:

``` {r echo = TRUE} lms <- . %>% lm(g ~ e, data = .) %>% summary() %>% .$r.squared

DLM take a different approach, saying the contribution of idiosyncratic shocks depends on the standard deviation of the idiosyncratic component relative to the standard deviation of the aggregate growth rate.
``` {r echo = TRUE}
dlm <- function(tbl) {
  cov <- tbl %>% select(one_of(c('e', 'm', 'g'))) %>% var() # gives a V-C matrix
   # get the relative variance on the diagonal and take the square root
  (cov / cov[dim(cov)[1], dim(cov)[2]]) %>% diag() %>% sqrt()
}

{r echo = TRUE} library(knitr) tab <- left_join(split(components, components$type) %>% map_dbl(lms) %>% round(2) %>% enframe(), split(components, components$type) %>% map(dlm) %>% map_dbl("e") %>% round(2) %>% enframe(), by = 'name') tab <- tab %>% mutate(name = name %>% stringr::str_pad(2)) %>% arrange(name) names(tab) <- c('No. of factors', 'e_t, R^2', 'e_t, DLM') tab %>% kable()

(These are random results, so there's not much to say about them---comparing Gabaix [the 1st column, labelled '$R^2$'] to DLM [the second column, labelled $e_t$].) But, one thing that should hold in any results are that the idiosyncratic contribution to aggregate growth should decrease as the number of factors increases.

tweed1e/idiosyncratics documentation built on May 29, 2019, 10:51 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com