knitr::opts_chunk$set(fig.width = 7, fig.height = 5, message = FALSE, warning = FALSE)
How much do idiosyncratic shocks contribute to aggregate growth? I'll use principal components analysis (PCA) to estimate the common factors and recover idiosyncratic shocks. I'll test the Gabaix and the DiGiovanni-Levchenko-Mejean (DLM) methods. We'll use some fake firm panel data to document the process.
Steps:
glmnet
package) to estimate the number of common factors to use to recover the idiosyncratic shocks``` {r echo = TRUE} library(idiosyncratics) library(tidyverse)
## Step 1: Create the fake data. ``` {r echo = TRUE} # Suppose the data has N firms and T years N <- 500 T <- 25 # A function to create a fake panel. asm <- fakedata::fake_panel(N = N, T = T, I = 5, model = list(order = c(1,0,1), ar = 0.75, ma = 0.05), missing = 0.05) # with 5% of the data randomly missing asm <- asm %>% rename(industry = I)
``` {r echo = TRUE} asm <- asm %>% add_weights(x_var = sales) %>% # Add weights based on firm sales lag_panel(x_var = w, l_var = w) %>% # Lag the weight variable so we can weight growth rates correctly add_growth_rates(x_var = sales, drop = TRUE) %>% # Calculate growth rates, dropping observations with NA growth rates add_quantiles(g_var = g) %>% # Add upper/lower quantiles of growth rates (0.025, 0.975), by year drop_outliers(g_var = g, method = 'winsorize') # winsorize growth rates.
The PCA methods require a balanced panel. ``` {r echo = TRUE} # Balance the panel asm <- asm %>% balance_panel(id, year) # get a separate copy of the aggregate growth rates g <- asm %>% group_by(year) %>% summarize(g = sum(w * g, na.rm = TRUE)) %>% tibble::rownames_to_column(var = 'index') # Convert balanced growth rate panel to a matrix, along with the weights X <- get_matrix(asm, id, year, g) W <- get_matrix(asm, id, year, w)
First, use the ICP criteria to estimate the number of common factors:
``` {r echo = TRUE}
xz <- icpx_pca(X, W)
Next, use sparse PCA (via elastic-net in ```glmnet```). This selects a small subsample of individual firms to best predict aggregate growth. ``` {r echo = TRUE} # Sparse PCA via the Lasso in glmnet. xy <- list('sparse' = sparse_pca(X, W))
Then combine all the residuals together in one table for analysis ``` {r echo = TRUE}
components <- bind_rows(xz, xy, .id = 'type') %>% left_join(g, by = c('year' = 'index')) %>% select(-year) %>% rename(year = year.y, e = value) %>% mutate(type = ifelse(type == 'sparse', "Sparse", type))
First, graph the residuals against the aggregate growth rate to see which ones track the best---a residual that tracks the aggregate well means the common factors don't match the aggregate. ``` {r echo = TRUE} # graph. tidy_components <- bind_rows(components %>% select(-g), g %>% mutate(index = 'Aggregate') %>% rename(e = g, type = index)) ggplot(tidy_components, aes(x = year, y = e, colour = type)) + geom_line() + labs(x = "Year", y = "Growth", title = 'Idiosyncratic components after removing common factors') + scale_colour_discrete(name="No. fctr")
Finally, we need to calculate the results. Gabaix regresses the aggregate growth rate $g_t$ on the idiosyncratic shock $e_t$, and calls the $R^2$ the amount that $e_t$ explains $g_t$. In R
, lm
will do OLS and report the $R^2$ in the as an element of the summary:
``` {r echo = TRUE} lms <- . %>% lm(g ~ e, data = .) %>% summary() %>% .$r.squared
DLM take a different approach, saying the contribution of idiosyncratic shocks depends on the standard deviation of the idiosyncratic component relative to the standard deviation of the aggregate growth rate. ``` {r echo = TRUE} dlm <- function(tbl) { cov <- tbl %>% select(one_of(c('e', 'm', 'g'))) %>% var() # gives a V-C matrix # get the relative variance on the diagonal and take the square root (cov / cov[dim(cov)[1], dim(cov)[2]]) %>% diag() %>% sqrt() }
{r echo = TRUE}
library(knitr)
tab <- left_join(split(components, components$type) %>%
map_dbl(lms) %>% round(2) %>% enframe(),
split(components, components$type) %>% map(dlm) %>%
map_dbl("e") %>% round(2) %>% enframe(), by = 'name')
tab <- tab %>% mutate(name = name %>% stringr::str_pad(2)) %>% arrange(name)
names(tab) <- c('No. of factors', 'e_t, R^2', 'e_t, DLM')
tab %>% kable()
(These are random results, so there's not much to say about them---comparing Gabaix [the 1st column, labelled '$R^2$'] to DLM [the second column, labelled $e_t$].) But, one thing that should hold in any results are that the idiosyncratic contribution to aggregate growth should decrease as the number of factors increases.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.