stability_plot: Create coefficient stability plot.

Description Usage Arguments Details Value

View source: R/create_stability_plot.R

Description

stability_plot is used to quickly produce a plot showing the stability of the OLS estimate of the explanatory variable rhs on the outcome variable lhs under combinations of a given set of controls. Fixed effects, clustering, weights, and instrumental variables are supported. stability_plot is a wrapper for the five steps of the starbility pipeline; see the advanced usage vignette for details.

Usage

1
stability_plot(data, lhs, rhs, perm, ...)

Arguments

data

A dataframe containing the variables in the model will be estimated.

lhs

A string indicating the name of the outcome variable in data.

rhs

A string indicating the name of the explanatory variable for which coefficient estimates will be plotted.

perm

A named dictionary in which values correspond to the sets of variables that should be iterated upon to produce the stability plot and names correspond to the names of these sets of variables that should be displayed in the plot.

base

Optional. A named dictionary in which values correspond to the sets of variables that should always be included in the model in all specifications and names correspond to the names of these sets of variables that should be displayed in the plot.

perm_fe

Optional. A named dictionary in which values correspond to the sets of fixed effects that should be iterated upon to produce the stability plot and names correspond to the names of these sets of variables that should be displayed in the plot. Functionally, these operate identically to perm; the difference is that starbility uses lfe to sweep them out of the normal equations, resulting in a performance boost over including them in perm.

nonperm_fe

Optional. A named dictionary in which values correspond to fixed effects that should be iterated upon to produce the stability plot and names correspond to the names of these sets of fixed effects that should be displayed in the plot. These fixed effects are included sequentially in the plot, one at a time – i.e. combinations of nonperm_fe are not included.

fe_always

Optional. A logical scalar. If one or more sets of fixed effects are specified in nonperm_fe, should the plot include only estimates from models with non-permuted fixed effects (rather than also including a set of estimates from models without any non-permuted fixed effects)? Defaults to F.

sort

A string specifying how models should be sorted by coefficient value. The default is none, which preserves the order in which controls are permuted. Other options are asc (sorted by ascending coefficient values), desc (sorted by descending coefficient values), asc-by-fe (sorted by ascending coefficient values within non-permuted fixed effects groups, but preserving the order of these groups), and desc-by-fe (sorted by descending coefficient values within non-permuted fixed effects groups, but preserving the order of these groups).

model

Optional. A function that takes at least three arguments: spec (a string containing the model specification), data (the data frame containing the variables in the model), and rhs (the name of the coefficient of interest). Arbitrary additional arguments are permitted. The function should then output a vector containing, in order, the coefficient estimate, the p-value, the bottom value of the error region, and the top value of the error region. If left unspecified, uses default implementation of felm (from lfe).

iv

Optional. A string indicating the variables which should be used to instrument rhs. If left unspecified, OLS coefficients are plotted.

cluster

A string indicating the name of the variable by which standard errors should be clustered. Defaults to no clustering.

weights

A string indicating the name of the variable containing weights. Defaults to equal weighting.

run_to

A numeric scalar indicating at which step the stability_plot should stop. This is useful if you want to make manual edits at one step. If left unspecified, runs entire plot. Currently, values of note include run_to=2 (useful if you want to define your own formulas for use in models other than felm), run_to=5 (useful if you want to take full control over the ggplot2 plotting), and run_to=6 (useful if you want to use most of the plot defaults, but add elements using “ggplot2'.)

point_size

A numeric scalar indicating the size of the points indicating coefficient estimates. Defaults to 1.

error_geom

A string indicating the type of geom that should be used to indicate confidence intervals on coefficient estimates. Currently supported are ribbon, errorbar, and none. Defaults to errorbar if fewer than 100 models are plotted; defaults to ribbon if 100 or more models are plotted.

error_alpha

A numeric scalar indicating the alpha of the error geom. Defaults to 0.2.

coef_ylim

A numeric vector of length two indicating the minimum and maximum values of the y-axis in the coefficient plot. If not specified, uses ggplot2 default.

coef_ylabel

A string specifying the y-axis label on the coefficient panel. Defaults to 'Coefficient estimate'.

control_geom

A string indicating the geom that should be used to indicate the presence of controls. Currently supported are circle and rect. Defaults to rect.

control_spacing

A string indicating how large the geoms indicating the presence of controls should be. For control_geom=='circle', this is the diameter of the circle. For control_geom=='rect', this is the width of the rectangle. Defaults to 0.75 if fewer than 40 models are displayed; defaults to 1 otherwise.

control_text_size

A numeric scalar indicating how large the control name text should be. Defaults to 9.

trim_top

A numeric scalar indicating how close the bottom panel (displaying presence of controls) should be to the top panel (displaying presence of coefficients). Useful when dealing with large CIs.

rel_height

A numeric scalar. Height of the control plot relative to the coefficient plot.

Details

Each row of the bottom panel of the plot corresponds to a single variable set. A variable set can contain one or more individual variables. To include multiple variables in a single set, specify them in a single string, separated by '+'.

Value

If run_to is left blank (default), returns a cowplot grid containing both panels. Else, returns the output of the function


AakaashRao/starbility documentation built on May 21, 2020, 9:49 a.m.