```
library(knitr)
```

`estimatr`

is for (fast) OLS and IV regression with robust standard errors. This document shows how `estimatr`

integrates with RStudio's `tidyverse`

suite of packages.

We use the Swiss Fertility and Socioeconomic Indicators data (available in `R`, description here) to show how `lm_robust`

works with `dplyr`

, `ggplot2`

, and `purrr`

. What is shown for `lm_robust`

here typically applies to all the other `estimatr`

functions (`lm_robust`

, `difference_in_mean`

, `lm_lin`

, `iv_robust`

, and `horovitz_thompson`

).

The first step to the tidyverse is turning model output into data we can manipulate. The `tidy`

function converts an `lm_robust`

object into a data.frame.

library(estimatr) fit <- lm_robust(Fertility ~ Agriculture + Catholic, data = swiss) tidy(fit)

`dplyr`

Once a regression fit is a data.frame, you can use any of the `dplyr`

"verbs" for data manipulation, like `mutate`

,`filter`

, `select`

, `summarise`

, `group_by`

, and `arrange`

(more on this here).

library(tidyverse) # lm_robust and filter fit %>% tidy %>% filter(term == "Agriculture") # lm_robust and select fit %>% tidy %>% select(term, estimate, std.error) # lm_robust and mutate fit %>% tidy %>% mutate(t_stat = estimate/ std.error, significant = p.value <= 0.05)

`ggplot2`

`ggplot2`

offers a number of data visualization tools that are compatible with `estimatr`

- Make a coefficient plot:

fit %>% tidy %>% filter(term != "(Intercept)") %>% ggplot(aes(y = term, x = estimate)) + geom_vline(xintercept = 0, linetype = 2) + geom_point() + geom_errorbarh(aes(xmin = conf.low, xmax = conf.high, height = 0.1)) + theme_bw()

- Put CIs based on robust variance estimates (rather than the "classical" variance estimates) with the
`geom_smooth`

and`stat_smooth`

functions.

library(ggplot2) ggplot(swiss, aes(x = Agriculture, y = Fertility)) + geom_point() + geom_smooth(method = "lm_robust") + theme_bw()

Note that the functional form can include polynomials. For instance, if the model is $Fertility \sim Agriculture + Agriculture^2 + Agriculture^3$, we can model this in the following way:

library(ggplot2) ggplot(swiss, aes(x = Agriculture, y = Fertility)) + geom_point() + geom_smooth(method = "lm_robust", formula = y ~ poly(x, 3, raw = TRUE)) + theme_bw()

`rsample`

The `rsample`

pacakage provides tools for bootstrapping:

library(rsample) boot_out <- bootstraps(data = swiss, 500)$splits %>% map(~ lm_robust(Fertility ~ Catholic + Agriculture, data = analysis(.))) %>% map(tidy) %>% bind_rows(.id = "bootstrap_replicate") kable(head(boot_out))

`boot_out`

is a data.frame that contains estimates from each boostrapped sample. We can then use `dplyr`

functions to summarize the bootstraps, `tidyr`

functions to reshape the estimates, and `GGally::ggpairs`

to visualize them.

boot_out %>% group_by(term) %>% summarise(boot_se = sd(estimate)) # To visualize the sampling distribution library(GGally) boot_out %>% select(bootstrap_replicate, term, estimate) %>% spread(key = term, value = estimate) %>% select(-bootstrap_replicate) %>% ggpairs(lower = list(continuous = wrap("points", alpha = 0.1))) + theme_bw()

`purrr`

`purrr`

provides tools to perform the same operation on every element of a vector. For instance, we may want to estimate a model on different subsets of data. We can use the `map`

function to do this.

library(purrr) # Running the same model for highly educated and less educated cantons/districts two_subsets <- swiss %>% mutate(HighlyEducated = as.numeric(Education > 8)) %>% split(.$HighlyEducated) %>% map( ~ lm_robust(Fertility ~ Catholic, data = .)) %>% map(tidy) %>% bind_rows(.id = "HighlyEducated") kable(two_subsets, digits =2)

Alternatively, we might want to regress different dependent variables on the same independent variable. `map`

can be used alongwith `estimatr`

functions for this purpose as well.

three_outcomes <- c("Fertility", "Education", "Agriculture") %>% map(~ formula(paste0(., " ~ Catholic"))) %>% map(~ lm_robust(., data = swiss)) %>% map_df(tidy) kable(three_outcomes, digits =2)

Using `ggplot2`

, we can make a coefficient plot:

three_outcomes %>% filter(term == "Catholic") %>% ggplot(aes(x = estimate, y = outcome)) + geom_vline(xintercept = 0, linetype = 2) + geom_point() + geom_errorbarh(aes(xmin = conf.low, xmax = conf.high, height = 0.1)) + ggtitle("Slopes with respect to `Catholic`") + theme_bw()

Using `estimatr`

functions in the tidyverse is easy once the model outputs have been turned into data.frames. We accomplish this with the `tidy`

function. After that, so many summary and visualization possibilities open up. Happy tidying!

graemeblair/DDestimate documentation built on Sept. 10, 2019, 7:38 p.m.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.