knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
suppressPackageStartupMessages(library(rchitex)) suppressPackageStartupMessages(library(tibble))
RCHITEX provides an extensive set of options allowing users to generate and output nicely formatted text tables while simultaneously outputting the equivalent $\LaTeX$ code to a provided path. rchitex is intended to bridge the gap between statistical exploration in R and article writing.
The describe()
function constructs a table of summary statistics for all the numeric columns of a data frame or tibble^[For the sake of brevity, I will just write drata frame instead of "data frame and tibble.". Rchitex works for both structures.] The default functions (in order) are
The default setting of describe()
produces text output, perfect for early exploratory work. Simplying providing a path to the argument path
will write a LaTeX file along with the printing the equivalent text table to the consule. Setting silent = TRUE
will supress the console output.
``` {r textTables} state_df <- tibble::as_tibble(datasets::state.x77) describe(state_df)
### Function customization Users can specify the order of functions, rename function lalbe, and supply additional (including self-defined) functions. The `summary` argument can be fed a named vectored or list of function names and summary functions. Note that `describe` supports user-defined functions as long as the function maps a vector of data to a single value, $f: \mathbb{R}^d \mapsto \mathbb{R}$. ``` {r} stat_funcs <- c('Average' = mean, 'St.D.' = sd, 'Random value' = function(v) sample(v, 1)) rchitex::describe(state_df, statistics = stat_funcs)
Futher aestetic customizations are available.
title
adds a title to the top row of the table. Holds true for text, Latex, and html output.note
adds a short note at the bottom of the Latex and html outputmax_precision
specifies the maximum number of digits to the right of the decimalflip
rotates the table so the columns are listed horizontally and functions verticallytitle <- 'State summary statistics' note <- 'a note' max_precision <- 0 describe(state_df, title = title, max_precision = max_precision, flip = TRUE)
Unless otherwise specified, rchitex writes summary statistics (and regression tables, but more on that later) as a tabular object. This is meant to give users greater control over how the table is intergrated in either their markdown or Latex document. Setting as_table
to true will wrap the tabular object with a table. This allows users to provide a reference label
.
#describe(state_df[,2:5], title = title, statistics = stat_funcs, # max_precision = 0, md = 'html', path = path, as_table = TRUE, # label = tbl:sumStats)
HTML and Latex tables can also be knitted into an Rmarkdown file (like this one). Setting the argument md
(short for markdown) to either "latex" or "html" will override the default text output. Note that writing to a local disk is still possible. In order for rmarkdown to correctly format the html or Latex code set results='asis'
in the chunk header.
``` {r, results = 'asis'} describe(state_df[,2:5], title = title, statistics = stat_funcs, max_precision = 0, md = 'html')
## Regression output rchitex also constructs highly customizable regression tables with the `build()` function. The general flow is similar to `describe()`. By default, a text table is outputted to the console. Providing a `path` will also output a Latex table to the provided path. Overriding the default null value for `md` will output a Latex or html table to the consule in order to be knitted into a markdown file. All the customization listed above are available except for `flip`. ``` {r, results = 'asis'} data(swiss) mod1 <- lm(data=swiss, Fertility ~ Agriculture + Infant.Mortality) mod2 <- update(mod1, . ~ . + Catholic + Examination) build(mod1, mod2, md = 'html')
The independent variables can be renamed by providing a vector or list of output names that correspond to the variable name found in the model. Variables can be excluded from the outputed table by excluded them from the vector of names. Likewise, the order of independent variables can be specified according to their order in the name vector. A constant variable is implicitly included in the model output as 'Constant'. It can be excluded , renamed, or moved just like any other independent variable
indep_names = c('Agriculture' = 'Agriculture', 'Infant.Mortality' = 'Infant Mortality', 'Catholic' = 'Catholic', 'Examination' = 'Exam') build(mod1, mod2, indep_names = indep_names, md='html')
Dependent variables can be individually labeled and group labeled. Individual model labeling alters the label above each column. By default, models are labeled sequentially by '(1)', '(2)', '(3)', etc. A (unnamed) character vector passed to the dep_names
parameter will replace the default model labels. The character vector should be in order intended. Each index will be mapped to its corresponding column.
Group models link individual models. For example, if a table includes three OLS regressions, each with a different dependent variable, and 2 binomial models, one being logit and the other being probit, the a grouping label may group OLS on one side and binominal on the other. The grouped_label
argument acepts a list in which each name indicates the intended group label. Each value in the vector should either be a scalar or vector indicating the column numbers to the included in the group. Groups should only include contiguous elements (for example columns 1,3 and 5 cannot be included in a single group) and they should be passed in ascending order. Columns may be skipped (for example, columns 1 and 2 as group 1 and columns 5 to 6 as group 2).
swiss$high_ed <- swiss$Education >= 12 grouped_label <- list('OLS' = c(1,2), 'Binomial' = c(3,4)) probit <- glm(data=swiss, high_ed ~ Agriculture + Infant.Mortality + Catholic + Examination, family = binomial(link = 'probit')) logit <- glm(data=swiss, high_ed ~ Agriculture + Infant.Mortality + Catholic + Examination, family = binomial(link = 'logit')) build(mod1, mod2, probit, logit, indep_names = indep_names, title = 'Table: Grouped labels', grouped_label = grouped_label, md='html')
General model statistics, or annotations in rchitex language, are reported after the each inpendent variable. The default annotations depends on the class of models. For example, OLS reports the number of observations, $R^2$, Adjusted $R^2$, and F statistic by default.
The included annotations can be customized by passing a single string to the annotations
argument. Each character of the string identifies an individual fit annotation. The order of the characters in the string will signal the order that the fit annotations will appear.
The following list shows the different annotation characters and what statistic they represent.
Note in the table below that if an annotation does not apply to a model, it shows up as a blank entry.
annotations <- 'rfl' build(mod1, mod2, probit, logit, indep_names = indep_names, title = 'Grouped labels', grouped_label = grouped_label, md='html', annotations = annotations)
Users may also add custom annotations which are annotations that are not internally calculated. Often you may need a row signally that a given model utilizes a restricted data set, or that robust standard errors were applied. In rchitex lingo, these are called custom_annoations
and appear just before internally computed annotations. They are specified by passing a list to the custom_annotations
parameter. The label of which is the label that is meant to appear on the table and the value is a vector of values corresponding to the value for the model reading left to right. The length of the vector is the number of models. For example, if you want a custom annotation stating the column number of a three model table, you would state `custom_annotations = c('model number' = c('1', '2', '3')).
custom_annotations = list('Model type' = c('OLS', 'OLS', 'Probit', 'Logit')) build(mod1, mod2, probit, logit, indep_names = indep_names, title = 'Grouped labels', md='html', annotations = annotations, custom_annotations = custom_annotations)
Both the thresholds and symbolic representation of the levels of statistical significance can be customized by passing a named vector or list to the sig
argument. Each label should identify a symbol and each value indicate a threshold.
``` {r significance, results = 'asis'} sig <- list('+' = .5, '++' = 0.05, '+++' = 0.01) build(mod1, mod2, probit, logit, indep_names = indep_names, title = 'Grouped labels', md='html', sig=sig)
### Standard errors rchitex allows users to adjust standard errors prior to model output. In most cases, you may need to apply robust standard errors. Wrapping a model in `rse` converts a model's standard errors to robust standard errors similar to STATA's robust option (HC1). It is highly recommended that any adjustment to standard errors be reported. ```r mod_rse <- rse(mod1) mod_adj <- adj_se(mod1, function(x) runif(1, 0, 1)) build(mod1, mod_rse, mod_adj, custom_annotations = list('SE' = c('Normal', 'Robust', 'Random')), annotations='or', md='html')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.